The Moving Picture Experts Group

Unified Speech and Audio Coding

Part number: 
Activity status: 

MPEG Unified Speech and Audio Coding (USAC)


MPEG doc#: N12235

Date: July 2011



1     Introduction

As mobile devices become multi-functional, and multiple devices converge into a single device, it is becoming prevalent for various types of content, including content that is a mix of speech and music, to be played on or streamed to mobile devices. There is a strong market need for a codec that is able to provide consistent quality for mixed speech and music content and to do so with a quality that is better than codecs that are optimized for either speech content or music content.

Some examples of envisioned use cases for this technology are:

  • Multi-media download to mobile devices
  • User-generated content such as podcasts
  • Digital radio
  • Mobile TV
  • Audio books

2        Technology Overview

Unified Speech and Audio Coding (USAC) is the newest MPEG audio standard, published in late 2011. It achieves consistently state-of-the-art compression performance for any mix of speech and music content. USAC incorporates several perceptually-based compression techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. However, for the first time in MPEG it combines the well-known perceptual techniques with a source coding technique: a model of sound production, specifically that of human speech.

The USAC specification has been designed specifically to compress arbitrary content composed of speech, music or a mix of speech and music. It provides performance that significantly improves the state-of-the-art at bit rates ranging from 8 kb/s for mono signals to 32 kb/s for stereo signals. Furthermore, it continues to provide improvements in compression performance when bitrates are increased to 64 kb/s for stereo and beyond.

This new specification is expected to have application in any area in which low-bit-rate transmission or storage is necessary and audio content is an arbitrary mix of speech, speech plus music and music. Example application scenarios for USAC include

  • Digital Radio, Mobile TV, Audio books focusing on speech and speech with background noise contents including announcement, advertisement, and narration.
  • Multimedia Download and Real-time Play on Mobile devices focusing on various types of Music and movie contents

Below is a block diagram of the USAC decoder. In the diagram the MPEG AAC tools can be seen at the left, the speech-based Transform Coded Excitation tools can be seen in the middle column (LPC Decoding and LP Envelope Weighting) and the speech-based Algebraic Coded Excitation Linear Prediction (ACELP) tools can be seen in the rightmost column. Finally, the MPEG tools of Spectral Band Replication (SBR) and spatial coding (MPEG Surround) can be seen at the bottom.