Audio profiles and levels
In order to maximize interoperability, only a small number of profiles have been defined for MPEG-4 Audio. Given the rather large number of coding tools and object types, this leads to the inclusion of a relatively large number of audio object types even in the more simple profiles. Some of the audio profiles (e.g. main profile) contain both natural and structured audio object types. The following table lists only the audio profiles containing natural audio object types:
- Speech coding profile: This profile contains the CELP and HVXC object types as well as an interface to TTS (Text-To-Speech).
- Scalable profile: This profile contains the lower complexity AAC object types (LC, LTP), both in MPEG-2 (IS 13818-7) style and using the syntax which enables scalability (MPEG-4 style). In addition, the TwinVQ and speech coding object types and all the tools for scalable audio are part of this profile. It is expected, that most early applications will use this profile.
- Main profile: This is the do-it-all profile of MPEG-4 natural and structured audio. It contains all the MPEG-4 audio coding tools.
An hierarchical organisation of the profiles supports the design for interoperability: The speech coding profile is contained in all the other profiles containing natural audio coding tools, the scalable audio profile is contained in the main profile. Table V shoes all the tools of MPEG-4 natural audio and their use in the different audio objects types.
Table V: Usage of Audio Object Types Audio Object Types Tools GA Bitstream Syntax Type Hierachy 13818-7 main 13818-7 LC 13818-7 SSR PNS LTP TLSS TwinVQ CELP HVXC AAC main X X ISO/IEC
13818-7
Stylecontains AAC LC AAC LC X X ISO/IEC
13818-7
StyleAAC SSR X X ISO/IEC
13818-7
StyleAAC LTP X X X ISO/IEC
13818-7
Stylecontains AAC LC AAC Scalable X X X X
scalable
TwinVQ X X
scalable
CELP X
HVXC X
Levels for the MPEG-4 audio scalable profile
The large number of possibilities to combine different audio object types makes the traditional way of defining levels according to the channel count, sampling frequency etc. very difficult. In order to enable decoder implementers to conform with a certain level definition and still retain the possibility to combine different audio object types, complexity units have been defined and are used to calculate necessary decoder capabilities. For each audio object type, the decoder complexity (for a given sampling rate and channel count) was estimated in PCUs (computing complexity counted as millions of operations per second needed) and RCUs (memory complexity counted in kWords buffer requirements). Of course these complexity numbers depend a lot on the architecture of a decoder, whether realized on a dedicated DSP or a general purpose computing architecture. The following table lists the estimates of decoders for different object types as submitted to the MPEG audio group:
The level of a scalable profile decoder can now be determined by PCU and RCU numbers in addition to the number of channels and sampling frequencies. Four levels have been defined. They are:
- Level 1: One mono object of up to 24 kHz sampling frequency, all object types.
- Level 2: One stereo or two mono objects of up to 24 kHz sampling frequency.
- Level 3: One stereo or two mono objects of up to 48 kHz sampling frequency.
- Level 4: One 5.1 channel object or a exible configuration of objects up to 48 kHz sampling frequency and a PCU up to 30 and RCU up to 19.
Table VI: Decoder Complexity Object Type Parameters PCU (MOPS) RCU (kWords) AAC Main 1) fs=48 kHz 5 5 AAC LC 1) fs=48 kHz 3 3 AAC SSR 1) fs=48 kHz 4 3 AAC LTP 1) fs=48 kHz 4 4 AAC Scalable 1) 2) fs=48 kHz 5 4 TwinVQ 1) fs=24 kHz 2 3 CELP fs=8 kHz 1 1 CELP fs=16 kHz 2 1 CELP fs=8/16 kHz 3 1 HVXC fs=8 kHz 2 1 Definitions:
fs = sampling frequency
Notes:
1) PCU Proportional to sampling frequency
2) Includes core decoder