GOLDBERG AND MÄKIVIRTA
AUTOMATED IN-SITU EQUALISATION
AES 23RD CONFERENCE, May 23-25, 2003
2
Equalisation is particularly prevalent in professional
sound reproduction applications such as recording stu-
dios, mixing rooms and sound reinforcement.
In-situ response equalisation is typically implemented
using a separate equaliser, although equalisers are in-
creasingly built into active loudspeakers. Some equal-
isers on the market play a test signal and then alter
their response according to the in-situ transfer func-
tion measured in this way [5] but the process can be so
sensitive that a simple ‘press the button and every-
thing will be OK’ approach proves hard to achieve
with reliability, consistency and robustness.
It is possible that equalisation becomes skewed if it is
based only on a single point measurement. The fre-
quency response in nearby positions can actually be-
come worse after applying an equalisation designed
using only a single point measurement. A classical
method to avoid this is to use a weighted average of
responses measured within the listening area. Such
spatial averaging is often required when the listening
area is large. Examples of spatial averaging have been
described in the automotive industry [6] and cinema in
the SMPTE Standard 202M [7]. Spatial averaging can
reduce local variance in midrange to high frequencies
and can also reduce problems caused by the fact that a
listener perceives sound differently to a microphone,
but typically reduces the accuracy of equalisation ob-
tained at the primary listening location.
The room transfer function is position dependent, and
this poses major problems for all equalisation tech-
niques. For a single loudspeaker in diffuse field no
correction filter is capable of removing differences
between responses measured at two separate receiver
points. At high frequencies a required high-resolution
correction can become very position sensitive. Fre-
quency dependent resolution change is then preferable
and is typically applied [8,9] but with the expense of
reduced equalisation accuracy. Perfect equalisation
able to achieve precisely flat frequency response in a
listening room, even within a reasonably small listen-
ing area, appears not to be possible. An acceptable
equalisation is typically a compromise to minimise the
subjective coloration in audio due to room effects.
Typically electronic equalisation in active loudspeak-
ers uses low order analogue minimum phase filters
[10-12]. Since the loudspeaker-room transfer function
is of substantially higher order than such equalisation
filters, the effect of filtering is to gently shape the re-
sponse. Even with this limitation, in-situ equalisers
have the potential to significantly improve perceived
sound quality. The practical challenge is the selection
of the best settings for the low-order in-situ equaliser.
Despite advances in psychoacoustics, it is difficult to
quantify what the listener actually perceives the sound
quality to be, or to optimise equalisation based on that
evaluation [13-15]. Because of this, in-situ equalisa-
tion typically attempts to obtain the best fit to some
objectively measurable target, such as a flat third-
octave smoothed response, known to have a link to the
perception of sound being free from coloration. Also,
despite the widespread use of equalisation, it is still
hard to provide exact timbre matching between differ-
ent environments.
Several methods have been proposed for more exact
inversion of the frequency response to achieve a close
approximation of unity transfer function (no change to
magnitude or phase) within a certain bandwidth of in-
terest [16-24]. Some researchers have also shown an
interest to control selectively the temporal decay char-
acteristics of a listening space by active absorption or
modification of the primary sound [25-30]. If realis-
able, these are extremely attractive ideas because they
imply that the perceived sound could be modified with
precision, to different target responses. Then, spatial
variations in the frequency response can become far
more difficult to handle than with low-order methods
because the correction depends strongly on an exact
match between the acoustic and equalisation transfer
functions, and can therefore be highly local in space
[25].
2.2. Room Acoustic Considerations
In small to medium sized listening environments, the
sound field in the frequency range up to a critical fre-
quency
f
c
, (typically 70…200 Hz in small spaces) is
often dominated by room modes and comb filtering
caused by low-order discrete reflections from room
boundaries. Sound reproduction can be problematic
because of this. For a room with a reverberation time
T
60
of 0.3 s the room mode bandwidth is approxi-
mately 2.2/T
60
= 7.3 Hz [23]. However, this does not
predict accurately what the decay rate of an individual
mode is as reverberation time represents the total de-
cay rate in diffuse field whereas modal decay rate may
vary.
Above
f
c
modal density becomes sufficiently high to
be described statistically. An unsmoothed room trans-
fer function shows a large number of high Q notches.
When frequency smoothing due to human hearing is
taken into account [31], the resulting sensation is a
rather smooth room transfer function causing timber
changes in the perceived audio.
In the time domain, early reflections before about
25 ms combine with the direct sound to produce tone
colouration (comb filtering effect). Reflections arriv-
ing later than about 25 ms are less problematic as they
typically combine to produce the reverberation of the
room and are perceived as separate sound events (ech-