MPEG AVC File Format
MPEG doc#: N7924
Date: April 2006
Author: David Singer (Apple), Mohammed Zubair Visharam (Sony)
Within the ISO/IEC 14496 MPEG-4 standard there are several parts that define file formats for the storage of time-based media (such as audio, video etc.). They are all based and derived from the ISO Base Media File Format, which is a structural, media-independent definition that is also published as part of the JPEG 2000 family of standards.
The AVC File Format  defines the storage for the Advanced Video Coding (ISO/IEC 14496-10/AVC) standard  data within files of the ISO Base Media File Format family.
The definitions in this specification are not intended for stand-alone use. Rather, they specify how AVC streams are stored in any file of the ISO base media file format family, such as MP4 .
AVC streams are configured using parameter sets. In the simple use of the AVC file format, these are stored in a configuration record in the descriptive data for the video track (the sample entry). Alternatively, if the parameter sets are highly dynamic, a separate parameter set stream may be stored in the file.
AVC streams are a sequence of access units, divided into NAL units. Each access unit is a file format sample, and the access units have a size indication in front of each one. That length indication can be configured as 1,2 or 4 bytes.
The sample groups defined originally in this specification and now in the ISO base media file format may be used in the AVC file format to divide the stream into layers and sub-sequences. This allows simple scalable processing of AVC streams (though the initial versions of AVC do not themselves contain provision for scalable coding).
The AVC codec also provides for stream switching. If a sequence is coded to different targets (e.g. bit-rates) and these are all stored in one file, then normally one would be able to switch between them at I-frames. The AVC codec also allows for switch pictures, which can be used to provide more switch points at lower cost. The file format contains structures to allow storage of these switch pictures.
The brand ‘avc1’ may be used as a minor brand to indicate that the extensions (sample groups etc.) originally defined in the first version of this specification are used. This brand is not used as a major brand; as indicated above, the overall file format containing AVC would be a suitable member of the family, such as MP4.
There is a registration authority which registers and documents the four-character-code code-points used in this file-format family, as well as some other code-points related to MPEG-4 systems. The database is publicly viewable and registration is free .
 ISO/IEC 14496-12, ISO Base Media File Format; technically identical to ISO/IEC 15444-12
 ISO/IEC 14496-14, MP4 File Format
 ISO/IEC 14496-15, Advanced Video Coding (AVC) file format
 ISO/IEC 14496-10, Advanced Video Coding
 The MP4 Registration Authority, http://www.mp4ra.org/
SVC File Format
Authors: Karsten Grüneberg and Thomas Schierl (Fraunhofer HHI)
SVC Extension of the AVC File Format
The AVC File Format specified in MPEG-4 Part 15 also includes a number of file format extensions for Scalable Video Coding (SVC) bitstreams. These extensions are backward compatible with a legacy (AVC) file reader. SVC Toolsets are specified which can be included or excluded by derived file format specifications: SVC Tiers, SVC Extractor, SVC Aggregator, and SVC Timed Meta Data, as detailed below.
The SVC file format provides compact data structures representing the scalable structure of a stored SVC bitstream. The whole SVC bitstream is regarded as a set of tiers (aka layers) that can be combined to different sub-bitstreams which are characterized by frame rate, spatial resolution and fidelity level. SVC specific sample group entries support the access to NAL units from a particular tier: The properties of each tier are listed in Scalable Group Entries, and each sample is associated with a Scalable NAL Unit Map Entry which describes the sequence of tiers inside the sample.
The very flexible design of the SVC file format allows for storing the whole SVC bitstream either in a single media track or splitting it into multiple tracks, each including the NAL units of one or more tiers. Multiple track storage can simplify the access to relevant NAL units for file readers.
SVC storage over multiple tracks still results in compact files because duplication of data is avoided by referencing data across media tracks using small data units called Extractors which are embedded as a NAL unit in the media data.
A sequence of SVC NAL units which belong to the same tier can be virtually concatenated using so-called Aggregators. Aggregators can contain SVC NAL units, aggregate them by reference or use a combination of both mechanisms. An SVC file reader will handle both included NAL units and NAL units aggregated by reference as one entity which reduces the size of SVC sample group entries. Legacy AVC file readers will not recognise the Aggregator and skip included NAL units.
The detailed properties of each NAL unit in an SVC track can be stored in timed metadata track using SVC specific statements and side information.
There are three Visual Sample Entries for SVC bitstreams which signal the toolset needed to access an SVC track. "avc1" tracks can be accessed by a legacy file reader. Support for SVC features such as Extractors or Aggregators is not required, i.e., an AVC reader finds the base layer NAL units and ignores SVC NAL units, if any included. "avc2" tracks contain an AVC base layer, but a reader needs to support Aggregators and Extractors. "svc1" tracks do not contain an AVC base layer and readers need to support Aggregators and Extractors.
SVC files can use Movie Fragments
Movie fragments are advantageous in the context of media download as the reader can start accessing the file as soon as the first portion has been received. The SVC file format supports the movie fragments.
The file format standard allows storing portions of the media data in different files, keeping all metadata in one file. If different tiers are stored individually in external files, this may allow for interesting use cases such as storage erosion without the need of rewriting the whole file.
ISO/IEC 14496-15, Advanced Video Coding (AVC) file format
ISO/IEC 14496-10, Advanced Video Coding, Annex G: Scalable Video Coding
Author: Miska M. Hannuksela (Nokia)
Multiview Video Coding (MVC) Extension of the Advanced Video Coding (AVC) File Format
The AVC File Format specified in MPEG-4 Part 15 also includes a number of file format extensions for Multiview Video Coding (MVC) bitstreams. These extensions are backward compatible with legacy (AVC) file readers. The following extensions are also used for Scalable Video Coding (SVC) bitstreams: extractors, aggregators, tiers, and timed metadata. MVC file format specifies toolsets similar to those in the SVC file format. The MVC file format toolsets are: MVCExtractor, MVCAggregator, MVCTiers, and MVCTimedMetaData.
Storage of MVC Bitstreams as Tracks
The MVC file format allows storage of one or more views into a track, similarly to the SVC file format. In general, a number of bitstream subsets, referred to as operating points, can be extracted from an MVC bitstream, each representing a different set of target output views at a particular temporal resolution. Storage of multiple views per track can be used, e.g., when a content provider wants to provide a multiview bitstream that is not intended for subsetting or when the bitstream has been created for a few pre-defined sets of output views (such as 1, 2, 5, or 9 views) where tracks can be created accordingly. If more than one view is stored in a track, the use of the tier sample grouping mechanism is recommended. The sample grouping mechanism is used to define tiers identifying the views present in the track and to extract required NAL units for certain operation points conveniently. The tier sample grouping mechanism is usually used with aggregator NAL units to form regular NAL unit patterns within samples.
When an MVC bitstream is represented by multiple tracks and a player uses an operating point that contains data in multiple tracks, the player reconstructs MVC access units before passing them to the MVC decoder. An MVC operating point may be explicitly represented by a track, i.e., an access unit is reconstructed simply by resolving all extractor and aggregator NAL units of a sample. If the number of operating points is large, it may be space-consuming and impractical to create a track for each operating point. In such a case, MVC access units are reconstructed implicitly by arranging NAL units in an order conforming to MVC. The MVC Decoder Configuration record contains a field indicating whether the associated samples use explicit or implicit access unit reconstruction.
Information of Views Included in a Track
The MVC sample entry contains a View Identifier box, which includes the view identifier values, view order indexes, the identifier values of referred views, and an indication of the type of the base view, if any. This information helps readers interpret how an MVC bitstream is stored in multiple tracks. The MVC sample entry may also include intrinsic and extrinsic camera parameters.
Information Applicable to More than One View
The Multiview Information box ('mvci') is specified to indicate information that applies to more than one view. One or more Multiview Group boxes included in the Multiview Information box indicate the preferred target output views provided in an MVC bitstream stored as one or more tracks. Characteristics (such as camera parameters) of the respective bitstream subset can also be indicated within the Multiview Group box using the Multiview Relation Attributes box ('mvra'), which is similar to the Track Selection box.
Information for Selection of Target Output Views
The MVC file format provides means for a player to determine which views are preferred for displaying, and select one or more tracks that provide the data for the desired operating point, preferring a track that is specific to that operating point over tracks that also contain other data. The display characteristics of players may differ; for example, the number of simultaneously displayed views and the optimal angle between views can be different. In order to guide a player for selection of output views, alternative groups of output views and the common and differentiating characteristics between them can be indicated with the Multiview Group Relation box ('swtc'), which also includes the Multiview Relation Attributes box ('mvra').
There are four Visual Sample Entries for MVC bitstreams which signal the toolset needed to access an MVC track. "avc1" tracks can be accessed by a legacy AVC file reader. Support for MVC features such as Extractors or Aggregators is not required, i.e., an AVC reader finds the base view NAL units and ignores MVC NAL units, if any included. "avc2" tracks contain an AVC base view, but a reader needs to support Aggregators and Extractors to obtain the base view. "mvc1" tracks do not contain an AVC base view and do not contain Extractors either. "mvc1" tracks are typically used with implicit access unit reconstruction. "mvc2" tracks do not contain an AVC base view but may contain Extractors and are hence suitable for explicit access reconstruction.
ISO/IEC 14496-15, Advanced Video Coding (AVC) file format
ISO/IEC 14496-10, Advanced Video Coding, Annex H: Multiview Video Coding