AUDIO CODING REFERENCE |
65
Appendix B - Audio Coding Reference
History
Audio takes up a lot of data. Just a regular phone call uses 64,000 bits per second.
Without data reduction, CD-quality audio – 16 bits ad 44.1 kHz sample rate – requires a transmission capability of about
706 thousand bits per second (kbps) for each audio channel. But the wires originally used for remote broadcasting on
the public switched telephone system were designed for voice-grade communications: 8 bits at 8kHz sample rate, or 64
thousand bits per second (kbps) per channel. That’s 11% of what we needed.
Curiosity Note!
You can arrive at these same numbers with nothing more complicated than grade-school math. Just
multiply the sample rate by the sample depth: 44,100 samples per second * 16 bits per sample = 705,600
bits per second for CD-quality mono audio. Multiply by 2 for stereo.
You can reduce the data requirements by lowering the quality somewhat. 13 bits would yield a respectable 78 dB dynamic
range, certainly adequate for casual home listening. And a 32 kHz sample rate, with careful equipment design, will give you
flat response to 15 kHz, the practical limit for analog FM broadcasting in North America. Unfortunately, that still left us
with telephone data channels about 93% too small to do the job. Besides, 13 bits is an awkward bit depth (resolution) for
computers to deal with, and the audio it produces wasn’t clean enough to survive transmitter processors.
Curiosity Note!
Bit depth and sample rate translate easily into audio specifications. Digital audio must have a sample rate
of at least twice the desired bandwidth, so 15 kHz audio requires (after a safety margin) 32 kHz sampling.
Each bit of sample depth represents slightly more than 6dB of dynamic range.
The first practical coding methods used a principle called ADPCM: Adaptive Delta Pulse Code Modulation. This took
advantage of the fact that it takes fewer bits to code the difference, or delta, between successive audio samples compared
to using the individual values. Further efficiency was gained by adaptively varying the difference comparator according to
the nature of the program material. G.722 and aptX are examples of ADPCM schemes. They achieve around a factor of 4:1
reduction in bitrate.
G.722 achieves additional efficiency by allocating its bits to match the patterns in the human voice, and it’s considered
adequate for news and talk programming over ISDN. But, for high-fidelity transmission, algorithms with more power are
required. These are based on psychoacoustics, where the coding process is adapted to the way we hear sounds. There are
several algorithms available, with varying complexity and performance levels.
Some years ago, the standards group ISO/IEC established the IOS/MPEG (Moving Pictures Expert Group) to develop
a universal standard for encoding moving pictures and sound for digital storage and transmission media. The standard
was finalized in November 1992 with three related algorithms, called Layers, defined to take advantage of psychoacoustic
effects when coding audio. Layer 1 and 2 were intended for compression factors of about 4:1 and 6 or 8:1, respectively.
These algorithms became popular in satellite and hard-disk systems. Layer 3 achieved compression of up to 12.5:1 – 8% of
the original size – which made it ideal for ISDN.