66
| Appendix B
Basic Principles of Perceptual Coding
With perceptual coding, only information that can be perceived by the human auditory system is retained.
Lossless – which, for audio, translates to noiseless – coding with perfect reconstruction would be an optimum
system, since no information would be lost or altered. It may seem that lossless, redundancy-reducing methods
(such as PKZIP, Stuffit, Stacker, and others used in computer hard-disk compressions) would be applicable to
audio. Unfortunately, no constant compression rate is possible due to signal-dependent variations in redundancy:
there are highly redundant signals like constant sine tones (where the only information necessary is the frequency,
phase, amplitude, and duration of the tone), while other signals, such as those which approach broadband noise,
may be completely unpredictable and contain no redundancy at all. Furthermore, looking for redundancy can take
time: while a popular song might have three choruses with identical audio data that would need to be coded only
once, you’d have to store and analyze the entire song in order to find them. Any system intended for a real-time use
must have a consistent output rate and be able to accommodate the worst case, so effective audio compression is
impossible with redundancy reduction alone.
Fortunately, psychoacoustics permits a clever solution! Effects called “masking” have been discovered in the
human auditory system. These masking effects (which merely prove that our brain is also doing something similar
to bit rate reduction) have been found to occur in both the frequency and time domains and can be exploited for
audio data reduction.
Most important for audio coding are the effects in the frequency domain. Research into perception has revealed
that a tone or narrow-band noise at a certain frequency inhibits the audibility of other signals that fall below a
threshold curve centered on a masking signal.
The figure below shows two thresholds of audibility curves. The lower one is the typical frequency sensitivity of
the human ear when presented with a single swept tone. When a single, constant tone is added, the threshold of
audibility changes, as shown in the upper curve. The ear’s sensitivity to signals near the constant tone is greatly
reduced. Tones that were previously audible become “masked” in the presence of “masking tones,” in this case, the
one at 300Hz.
All signals below the upper “threshold of audibility” curve, or “masking threshold,” are not audible, so we can drop
them out or quantize them crudely with the least number of bits. Any noise which results from crude quantization
will not be audible if it occurs below the threshold of masking. The masking depends upon the frequency, the level,
and the spectral distribution of both the masker and the masked sounds.
Masking effects in the frequency domain. A masking signal inhibits audibility of signals adjacent
in frequency and below the threshold.