The ISO/IEC WG11 of JTC1/SC2 (MPEG) met in Kurihama, Japan, in the period of November 18th to 26th. In the first week of the meeting a Committee Draft (CD) of the MPEG standard was finalised. The techniques developed by the MPEG Committee will enable many applications requiring digitally compressed video and sound. The storage media targeted by MPEG include CD-ROM, DAT, and computer disks and it is expected that MPEG-based technologies will eventually be used in a variety of communication channels such as ISDN and local area networks and even in broadcasting applications.
At the rate of 1.2 Mbits per second, good quality pictures have been demonstrated at 24, 25 and 30 frames per second, and at a spatial resolution of 360 samples per line. This resolution is consistent with the resolution of consumer grade television. To code stereo sound of Compact Disc quality, a rate of appproximately 0.2 Mbits per second is required, resulting in a total rate of 1.4 Mbits per second. Such a rate could permit numerous applications including video and associated audio on Compact Disc.
The Committee Draft "Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s" consists of three parts: System, Video and Audio. The System part (11172-1) deals with synchronisation and multiplexing of audio-visual information, while the Video (11172-2) and Audio part (11172-3) address the video and the audio compression techniques respectively.
System
The MPEG-System committee completed and approved for release the technical specification for combining a plurality of coded audio and video streams into a single data stream. The specification provides fully synchronised audio and video and facilitates the storage in and the possible further transmission of the combined information through a variety of digital media.
This "systems coding" includes necessary and sufficient information in the bit stream to provide the system-level functions of synchronisation of decoded audio and video, initial and continuous management of coded data buffers to prevent overflow and underflow, random access start-up, and absolute time identification. The coding layer specifies a multiplex data format that allows multiplexing of multiple simultaneous audio and video streams as well as privately defined data streams.
The basic principle of MPEG System coding is the use of time stamps which specify the decoding and display time of audio and video and the time of reception of the multiplexed coded data at a decoder, all in terms of a single 90kHz system clock. This method allows a great deal of flexibility in such areas as decoder design, the number of streams, multiplex packet lengths, video picture rates, audio sample rates, coded data rates, digital storage medium or network performance. It also provides flexibility in selecting which entity is the master time base, while guaranteeing that synchronisation and buffer management are maintained. Variable data rate operation is supported. A reference model of a decoder system is specified which provides limits for the ranges of parameters available to encoders and provides requirements for decoders.
Some optional sets of constraints provide a framework for common industry acceptance of certain key parameters for use by decoder designers and information providers. While the MPEG Systems specification is included in the current work item of MPEG, it is designed for compatibility with future extensions to audio, video and hypermedia coding and a wide variety of bitrates.
Video
Dozens of algorithmic approaches were carefully reviewed over a period of 3 years refining and perfecting this video compression algorithm. While the MPEG compression algorithm is optimised for bitrates of about 1.5 Mbit/s, it can perform very effectively over a wide range of bitrates and picture resolutions. The video standard does not recommend a particular way of encoding pictures and much flexibility is given to implementers of the standard to use the MPEG syntax to optimise the visual quality and access options. The color resolution is given particularly high attention so as to support computer applications, games and animation.
The compression techniques developed in MPEG rely on the discrete cosine transform (DCT) for spatial redundancy reduction and motion compensated inter-frame coding to take into account the high temporal correlation of video signals by using information from both the past and the future. The statistics of the resulting information also can be exploited to further reduce the bitrate through the use of special codes known as Huffman codes. While the discrete cosine transform has been widely used for many years, the techniques developed within MPEG also exploit the characteristics of the human visual system to optimise the perceived image quality. Eventual coding impairments are concentrated in frequencies and regions of the picture where they are perceptually minimal.
Audio
The audio coding experts of MPEG finalised an audio coding algorithm after having reviewed and tested many approaches over the last three years. The resulting digital audio bitrate reduction technique supports several bitrates covering a range from intermediate to compact disc quality. This latter quality can be obtained at a total bitrate of 256 kbit/s for a stereophonic program.
Depending on the applications, three layers of the coding system with increasing complexity and performance can be used. In all three layers the time domain input audio signal is converted into a frequency representation. In Layers I and II a filterbank creates 32 subband representations of the input audio stream which are then quantised and coded under the control of a psychoacoustic model from which a blockwise adaptive bit allocation is derived. With respect to Layer I , Layer II introduces further compression by redundancy and irrelevance removal on the scalefactors and more precise quantisation. In Layer III additional frequency resolution is provided by the use of a hybrid filterbank. Every subband is thereby further split into higher resolution frequency lines by a linear transform that operates on 18 subband samples in each subband. The frequency lines are again quantised and coded under the control of a psychoacoustic model. In Layer III, nonuniform quantisation, adaptive segmentation and entropy coding of the quantised values are employed for a better coding efficiency.
The range of bitrates (total rate for both channels) provided by the standard is between 64 kbit/s and 448 kbit/s. The standard also supports coding starting at 32 kbit/s for a single channel. In all layers a joint stereo mode that exploits stereophonic irrelevance or stereophonic redundancy can be used as an option to improve the subjective quality.
Next Phase of the MPEG Standard
The succes of the initial phase of work will let the experts of MPEG focus on developing a generic standard for the compression of higher resolution video signals to be used for storage as well as for communication applications with bitrates in the range of 5 to 10 Mbits/s. This work is being conducted in close collaboration with the CCITT Experts Group for ATM (Asynchronous Transfer Mode) video coding.
In the same week, 32 different proposals of video coding targeted at those bitrates were tested. The test consisted of a subjective quality assessment and an analysis of the implementation complexity of each proposal. The results of the test show that distribution quality video is achievable at those rates with a reasonable implementation cost. The proposing organisations represent a wide sample of companies in consumer electronics, broadcasting, computers and telecommunications as well as universities.
After analysing the results of the test, the experts are beginning a phase of collaborative algorithm improvements with the goal of achieving a unified solution during 1992.