What should be the target response of a monitoring system at the listening position?

First, the role of a monitoring system is to reproduce sound without adding or taking anything away from the original input signal. The reason for such a definition is that the human hearing features a phenomenon called Auditory Masking* and that modern recording systems have a flat electronic frequency response. So, to accurately monitor what is recorded on the hard drive or tape machine, the monitoring system must also have a flat response at the listening position.

Secondly, onto the often quoted "final mix translation" issue, one can observe that domestic and car audio systems are generally improving over time and having better, i.e. flatter, frequency response. As a general note, a good mix should sound good on any system. The average of many different reproduction systems actually tends towards a flat frequency response.

The above arguments lead to the conclusion that a monitoring system must somehow yield a flat response at the listening position. Genelec monitors have a flat response in anechoic conditions. When the monitors are placed into a listening room, their response changes and the built-in Room Response Controls can be used to retrieve a flat response at listening position.

One exception to this rule is the X-Curves as used in the movie industry. Movie theatre replay systems are installed in very large rooms (e.g. a movie theatre for 200-800 people) and the frequency response across the audience area is never flat. The Dubbing Stage must replicate this response so that the mix translates precisely to the Movie Theatre. Note that the soundtracks for the release of movies on DVD's are re-mixed on flat response monitoring systems for reproduction in domestic environments.

Note*: Humans do not have the ability to hear minute differences in frequency. For example, it is very difficult to discern a 1000 Hz signal from one that is 1001 Hz. This becomes even more difficult if the two signals are playing at the same time. Furthermore, the 1000 Hz signal would also affect a human's ability to hear a signal that is 1010 Hz, 1100 Hz or 990 Hz. This concept is known as Auditory Masking. If the 1000 Hz signal is strong, it will mask signals at nearby frequencies, making them inaudible to the listener. For a masked signal to be heard, its power will need to be increased to a level greater than that of a threshold that is determined by the frequency of the masker tone and its strength. Related to monitoring system frequency response, it means that any strong irregularities in such frequency response (i.e. significant bumps) will generate masking of nearby frequencies and hence degradation of the sound reproduction.