The Moving Picture Experts Group

Spatial Audio Object Coding

Part number: 
Activity status: 

MPEG Spatial Audio Object Coding (SAOC)

MPEG doc#: N9932

Date: April 2008




MPEG Spatial Audio Object Coding (SAOC) is an audio coding algorithm which allows highly efficient storage and transport of individual audio objects (e.g. voices, instruments, ambience, ..) in an audio mix, while preserving the possibility for the listener to adjust the mix based on his personal taste. That includes changing the rendering configuration of the audio scene from stereo over surround to even binaural reproduction.


The MPEG Surround technology supports very efficient parametric coding of multi-channel audio signals. The idea of MPEG SAOC is to apply similar basic assumptions together with a similar parameter representation for very efficient parametric coding of individual audio objects (tracks). Additionally a rendering functionality is included to interactively render the audio objects into an acoustical scene for several types of reproduction systems (1.0,2.0, 5.0, .. for loudspeakers or binaural for headphones).


SAOC is designed to transmit a number of audio objects in a joint mono or stereo downmix signal to later allow a reproduction of the individual objects in an interactively rendered audio scene. For this purpose SAOC encodes Object Level Differences (OLD), Inter-Object Cross Coherences (IOC) and Downmix Channel Level Differences (DCLD) into a parameter bitstream. The SAOC decoder converts the SAOC parameter representation into an MPEG Surround parameter representation, which is then decoded together with the downmix signal by an MPEG Surround decoder to produce the desired audio scene. The user interactively controls this process to alter the representation of the audio objects in the resulting audio scene.

Target applications

Among the numerous conceivable applications for SAOC, a few typical scenarios are listed here.

Consumers can create personal interactive remixes using a virtual mixing desk. Certain instruments can be, e.g., attenuated for playing along (like Karaoke), the original mix can be modified to suit personal taste, the dialog level in movies / broadcasts can be adjusted for better speech intelligibility etc.

For interactive gaming, SAOC is a storage and computationally efficient way of reproducing sound tracks. Moving around in the virtual scene is reflected by an adaptation of the object rendering parameters. Networked multi-player games benefit from the transmission efficiency using one SAOC stream to represent all sound objects that are external to a certain player’s terminal.

Current telecommunication infrastructure is monophonic and can be extended easily in its functionality. Terminals equipped with an SAOC extension pick up several sound sources (objects) and produce a monophonic downmix signal, which is transmitted in a compatible way by using the existing (speech) coders. The side information can be conveyed in an embedded, backward compatible way. Legacy terminals will continue to produce monophonic output while SAOC-enabled ones can render an acoustic scene and thus increase intelligibility by spatially separating the different speakers (“cocktail party effect”).