EN5990 Encoder for MPEG-4 Part 10
20
Encoder
Decoder
4.4s End to End Delay for Both Components
Video PTS = 4.0s
Audio PTS = 0.1s
input video
input audio
Decoded video
Decoded Audio
Transmission
Figure 5.1: Time stamp Structure for Video and Audio
The decoder could use the audio time stamps as the timebase to determine how
much video to buffer before decoding the frames. However the audio will still vary
by some 100 milliseconds causing lipsync issues. The correct timebase to use in
the decoder for referencing the presentation and decoding time stamps is the
system clock as indicated by the PCR (program counter reference) values in the
Transport Stream. The PCR values represent a very accurate clock reference from
a stable temperature-controlled oscillator within the encoder and hence, will be
more accurate than any clock system in a consumer decoder.
5.9
Why are there Referenced B Pictures?
With MPEG-4 H.264/AVC, B pictures contain extra tools, which allow them to be
coded more efficiently. The detail is that the prediction modes available to
generate data within a B picture, from multiple reference frames, represent a
superset of the modes available to a generate P picture, from a single reference
frame. One such mode is “Direct Mode” where the motion vector is estimated from
the equivalent motion vectors of the reference frame and minimal additional
information is required.
Hence the number of bits required to encode B pictures is significantly less than P
pictures. This has led to the development of referenced B pictures, which are
pictures that are generated as B pictures so use fewer bits than an equivalent P
picture but can be used as references for other B pictures. Thus, fewer bits are
required to encode the same sequence to the same quality level. The technique is
called hierarchical B pictures as a hierarchy of B pictures is created. The
application of this technique has several implications:
!
The encoding quality for the same bitrate improves dramatically for still and
low motion sequences. The encoder monitors the amount of motion and
seamlessly adapts the number of B pictures accordingly to obtain optimal
efficiency.
!
Due to the increased number of B frames, the time distance between
P pictures is larger with an accompanying increase in the coding delay.
!
As such, the difference between the DTS and PTS of the P frames is
larger than with the traditional GOP structure.