DATA COMPRESSION
Data compression works by representing the original data information in less bits and
transmitting the reduced data bits through the data link. The receiver recovers the original
information by reversing the `representing' process. The process of representing original data in
less bits is called redundancy removing. Its effectiveness is both algorithm and data dependent.
A random data file is not compressible. A data file with a high degree of predictability, like an
ASCII English text file, graphic file, or database file, is suitable for data compression.
In the modem, the data compression is activated between asynchronous to synchronous
conversion in an attempt to reduce the number of bits actually sent. The receiving modem
applies these techniques in reverse to recover the actual data from the compressed data stream.
U-1496 series modems support both V.42bis and MNP5 data compression protocols. Data
compression needs an error-free data link work correctly, otherwise the corrupted compressed
data stream will ruin the decompression process. MNP5 is used with MNP4 error control and
V.42bis is used with V.42 error control.
MNP5 data compression utilizes the run-length encoding and adaptive frequency encoding
techniques. V.42bis uses a string coding algorithm.
The compression efficiency of V.42bis is generally higher than that of MNP5. In some cases it
can be 50% to 100% higher and in other cases it is just a little bit higher. In general, it is about
50% better in efficiency.
Run-length Encoding
Run-length encoding is applied in an attempt to avoid sending long sequences of repeated
characters (data). When three or more repeated characters appear in succession, only the first
three tokens (representing the compressed format of that character) and a repetition count will
be sent.
Adaptive Frequency Encoding
Adaptive frequency encoding is applied after removing repeated characters (data). In adaptive
frequency encoding, a token is substituted in the data stream for the actually occurring character
in an attempt to send fewer than 8 bits for each character. The token is generated from a
dynamic tabulation of character appearing frequency. Total number of available tokens is 256, of
which only the first 32 tokens are smaller than 8 bits, so random data will take no advantage
from this technique.
String Coding
Instead of sending each data character individually, a token for a character string is sent. The
modem adaptively builds a dictionary of string tokens according to data that appears. U-1496
series modems support a dictionary size up to 2K string tokens. The input data characters are
combined and checked for a matching string in the dictionary. The token is sent for the longest
matched string. Compressibility is high if there are some regularities of character pattern in the
data.
For the U-1496 series of modems, the error control and data compression option can be enabled
from the front panel or terminal.