Immersive monitoring: A perceptive perspective
Hearing the world around us is so natural that we often only notice its importance once we lose the ability. Most of the time, a loss is fortunately temporary, for instance caused by a cold, but a one-side hearing loss is more stressful and depressing than we generally tend to believe.
Localisation makes use of the most energy-consuming and fast-firing synapses of the brain, so the capability has been important for survival. Hearing, balance/acceleration and proprioception are our main look-ahead senses, without which the 0.4 s latency of our mind could get us hurt many times per day, for instance if we had to rely just on vision.
Hard-wired reflexes from the fast senses therefore play a crucial role, also when sound is accompanied by picture, conveying dimensionality, suspense and surprise. One of the first things a baby does is to localise, quickly and automatically turning eyes towards a sound. Until adolescence, we further learn and refine localisation using a system under construction. Ear canals and other structures of the outer ear ("pinnae") grow and reshape, constantly modifying spherical hearing, as we reach out and experience a fascinating world in return.
Pinnae continue to be entirely personal. To some extent, they are actually also under development throughout life, though the rate of change slows in adults. Sound is colored by the pinnae, depending on its direction of arrival (azimuth), which is a highly important feature. Expert listeners constantly use it in combination with head movements; not only when evaluating immersive content but also to distinguish direct sound from room reflections.
Personal head related transfer functions (HRTFs) drive localisation, considering frequencies above 700 Hz. That is the frequency range where interaural level difference (ILD) is of primary concern. From 50 Hz to 700 Hz, however, fast-firing synapses in the brainstem are responsible for localisation, employed in a phase-locking structure to determine interaural time difference (ITD). Humans can localise at even lower frequencies, but we will come back to that in a specific ultra low frequency blog.
The ability to position sound sources with precision spherically is a key benefit of immersive systems. Another is the possibility to influence the sense of space in human listeners. For the latter, the lowest two octaves of the ITD range (i.e. 50-200 Hz) play an essential role; but may be compromised in multiple ways: Microphones with not enough physical spacing during pick-up, synthesized reverb without the right kind of decorrelation, lossy codecs that collapse channel-differences, loudspeakers with limited LF capability, bass-management etc.
So where does this all lead, considering immersive reference monitoring? A well-aligned loudspeaker system in a fine room has the best chance of translating well to a variety of immersive playback situations. The sound engineer can make full use of outer ear features and head movements, with listener fatigue and "cyber sickness" minimised.
Headphone-based immersive monitoring needs to incorporate precise, personal HRTFs and head tracking around a n-channel virtual reference room. Even so, any static or temporal imperfection can lead to listener fatigue, and head movements in production are unlikely to produce anywhere near the same results as during reproduction across platforms.
Photo: Jungle Studio's Steven Boardman, Credit: Rob Jarvis.