INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 N12235
July 2011, Torino, Italy
Title |
MPEG Technology: Unified Speech and Audio Coding |
Source |
Audio Subgroup |
Status |
Approved |
As mobile devices become multi-functional, and multiple devices converge into a single device, it is becoming prevalent for various types of content, including content that is a mix of speech and music, to be played on or streamed to mobile devices. There is a strong market need for a codec that is able to provide consistent quality for mixed speech and music content and to do so with a quality that is better than codecs that are optimized for either speech content or music content.
Some examples of envisioned use cases for this technology are:
Unified Speech and Audio Coding (USAC) is the newest MPEG audio standard, published in late 2011. It achieves consistently state-of-the-art compression performance for any mix of speech and music content. USAC incorporates several perceptually-based compression techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. However, for the first time in MPEG it combines the well-known perceptual techniques with a source coding technique: a model of sound production, specifically that of human speech.
The USAC specification has been designed specifically to compress arbitrary content composed of speech, music or a mix of speech and music. It provides performance that significantly improves the state-of-the-art at bit rates ranging from 8 kb/s for mono signals to 32 kb/s for stereo signals. Furthermore, it continues to provide improvements in compression performance when bitrates are increased to 64 kb/s for stereo and beyond.
This new specification is expected to have application in any area in which low-bit-rate transmission or storage is necessary and audio content is an arbitrary mix of speech, speech plus music and music. Example application scenarios for USAC include
Below is a block diagram of the USAC decoder. In the diagram the MPEG AAC tools can be seen at the left, the speech-based Transform Coded Excitation tools can be seen in the middle column (LPC Decoding and LP Envelope Weighting) and the speech-based Algebraic Coded Excitation Linear Prediction (ACELP) tools can be seen in the rightmost column. Finally, the MPEG tools of Spectral Band Replication (SBR) and spatial coding (MPEG Surround) can be seen at the bottom.
