45
The primary responsibility of the information in the RTP header is to allow the decoder to find the proper playout sequence of
the media contained in the packet. RTP doesn’t contain any intelligence about what is actually contained in the payload--this
has to be handled by other means.
An RTP stream is unidirectional. If a duplex stream is required, an additional independent RTP stream must be initiated in the
reverse direction (This function is handled by the Session Initialization Protocol (SIP) layer discussed later).
Finally, an RTP stream (or session, as it’s called) has a companion stream that is initiated and travels alongside it for the
duration of its life. It’s called RTCP and is sent to the same IP address as the RTP stream, but at one port higher. It’s used for
RTP stream quality statistics but doesn’t carry any actual audio, so it uses a small amount of data. But it’s important to know
about if you’re troubleshooting firewall or NAT issues.
RTP alone can be the basis of a very primitive VoIP call. If each end of the call knows in advance information about encoders
used, no NAT routers are involved, and the call can be manually initiated and answered on each end, RTP streams can be
“pushed” between the destinations and will provide the path for VoIP. Of course, real-world VoIP involves much more, so we
need to add complexity to the system.
Encoders
Broadcasters who’ve used POTS, ISDN or IP audio products are familiar with the concept of encoding compression. This is
the choice of encoder within the system used to compress digital audio so it uses less network capacity. Encoders like MP3
and AAC are common in that world.
You’ll see the VoIP industry use the term “codecs” for this function. But because broadcast transmission devices are also
termed “codecs”, we’ll reserve it to describe hardware, and use “encoders” to describe compression algorithms.
VoIP has its own spectrum of useful encoder choices. VoIP encoders require very low delay and reasonable computational
complexity. The RTP protocol has definitions for how to fit all popular encoder payloads into a session.
G.711
The lowest common denominator encoder in VoIP is the same one that has been used by digital telephone networks for
decades, defined as G.711. It’s a simple way to compress audio, resulting in a network utilization of 64 Kb/s per channel
in each direction, a compression of about only 30% from the original uncompressed stream. This is considered the highest
amount of allowable data for a single call by modern standards, and it can add up quickly as multiple calls are handled on
the same network. To its benefit, the encoder requires virtually no computer power to compress or decompress.
G.711 is limited in terms of audio fidelity by the choice of its audio sampling rate. Calls using this encoder usually provide only
300 Hz-3 KHz audio response, resulting in the familiar thin sound of phone call, especially when put “on the air”.
G.711 actually has two variants, one used mostly in North America (μ-law), and another used elsewhere (a-law). These are
defined by the names of the tables used within the encoders to compress. All Comrex codecs and VoIP devices support G.711.
Summary of Contents for VH2
Page 1: ...Product Manual ...