Tomorrow’s Solutions, Today

In the second of our series of blogs on Immersive Audio, we drill down into some of the technicalities of working in immersive – and show how Genelec is helping audio professionals embrace this new world of opportunity.  

Benefits Over Legacy Formats

Immersive audio formats not only surround the listener, they also encircle them in the height dimension too. One way to understand the capability of an immersive audio system is to describe how many height layers an immersive playback system offers. The two-channel stereo and conventional surround formats offer only one height layer; this layer is located at the height of the listener’s ears, with all loudspeakers located at equal distance (in terms of acoustic delay) from the listener and playing back at the same level.

The channel layouts for immersive formats serve several purposes. One target is to create envelopment and a realistic sense of being inside an audio field. One height layer alone cannot create this sensation sufficiently realistically, because a significant portion of the listening experience is created by the sound arriving at the listener from above. The extra height layers of a true immersive system enable this, and therefore add a significant dimension to the experience.

The second aim for immersive systems used with video is to be able to localize the apparent source of audio at any location across the picture. This is the reason why the 22.2 format has three height layers, including the layer below the listener's ears: the UHDTV picture can be very large, extending from the floor to the ceiling, and the audio system has to support localizing audio across the whole area of the picture.

Current Industry Developments

Immersive sound monitoring is gaining momentum at increasing speed, and several systems are competing for dominance in the world of 3D immersive audio recordings. The front-runners are now the cinema audio formats, who are trying to increase their presence in the audio-only area and enter the television broadcast market too.

Whereas the cinema industry is always searching for the next ‘wow-effect’ to lure the audience from the comfort of their homes into theatres, the growth of immersive audio has been slightly slower in the world of television. But the pace is now picking up, with several companies studying 3D immersive sound as a companion to ultra-high definition television formats and the International Telecommunication Union (ITU) issuing recommendations about the sound formats to accompany UHDTV pictures. Japan’s own national broadcaster NHK is already starting to deliver 8K programming, with 22.2 audio, in preparation for the Tokyo 2020 Summer Olympic Games.

Height Layers

Modern immersive formats offer two or three height layers: current cinema formats offer two height layers, while the emerging broadcasting formats have three or more.

One of the height layers is always at the height of the listener’s ears - this typically creates a layout with backward compatibility to surround formats and even down to standard stereo. Typically, other layers are above the listener. For certain formats, layers can also be located below the listener in the front only, to enhance the sense of envelopment.

Certain encoding methods for broadcast applications can compress 3D immersive audio into a very compact data package for storage or transmission to the customer. These formats offer a very interesting advantage over the many immersive audio formats since the channel count and the presentation channel orientations can be selected according to the playback venue or room. Essentially any number of height layers and density of loudspeaker locations can be used - and furthermore, this density does not need to be constant.

Creating the loudspeaker feeds for loudspeakers dynamically from the transport format is called rendering. The compact audio transport package is decoded and the feeds to all the loudspeakers are calculated in real time while the immersive audio is played back in the user’s location. The compact delivery format and the freedom to adjust and optimize the number and location of the playback loudspeakers makes these flexible formats very exciting.

Common Assumptions

The popular immersive audio playback systems typically share two assumptions about the loudspeaker layout and one assumption about the loudspeaker characteristics. Concerning layout, it is assumed that the same level of sound will be delivered to the listening location from all loudspeakers, and the time taken for the audio to travel from each loudspeaker to the listener will also be the same. This implies equal loudspeaker distance, for the case where all loudspeakers are similar in terms of internal audio delay, or electronic adjustments of the level and delay to align the system.

Concerning loudspeaker characteristics, a fundamental assumption is the similarity of the loudspeaker frequency response for all the loudspeakers in the playback system. Sometimes this is taken to mean that all the loudspeakers in the playback system should be of the same make and model. In reality, loudspeaker sound is affected by the room in many ways. This can significantly change the character of the audio signal so that even when the same make and model of the loudspeaker is used throughout the system, the individual locations of the loudspeakers will change the audio in a way that renders the individual loudspeaker performance slightly different.

Getting Aligned

Genelec has the widest selection of Smart Active Monitors (SAM), and working in conjunction with the Genelec Loudspeaker Manager (GLM) software, users can configure highly accurate monitoring systems for immersive audio. In fact, since GLM today supports up to 45 loudspeakers and subwoofers in one room, Genelec’s solution for immersive audio monitor control covers all existing audio playback formats in existence today. Quite simply, SAM and GLM are future-proof tools for top-level audio professionals.

In order to fulfill the previously mentioned assumptions about how the playback system works, calibration and alignment of the monitoring system in the room is necessary. An increasing number of monitoring controllers with calibration are appearing on the market, but GLM is by far the most complete and cost-efficient solution for precise calibration of immersive monitoring systems.

GLM takes care of the essentials of calibrating an immersive audio playback system, with features to make the monitoring systematic and controlled, including the alignment of levels and time of flight at the listening location, subwoofer integration and compensation for the acoustical effects of loudspeaker placement – ensuring that all the loudspeakers in the system deliver a consistent and neutral sound character.

All of these can improve both the quality of the production and the speed of the working process, and additionally one of the key concepts for immersive monitoring is achieving a standard sound level at the listening location, with new recommendations about maintaining loudness in broadcast signals including a definition of the SPL at the listening location for monitoring these loudness-controlled signals.

As well as the wealth of configuration and calibration features already mentioned, our latest software version - GLM 3 - supports both preset levels and loudness-oriented level calibration in order to do just this.

So it’s clear that Genelec has tomorrow’s solutions today!

Aki Mäkivirta

R&D Director