The Moving Picture Experts Group

Common Media Application Format Brings The Bitstreams Together

by Philip Merrill - June 2018

Common Media Application Format (CMAF) is part of MPEG-A, which integrates existing MPEG technologies in application formats to improve interoperability for specific applications. CMAF is the result of widespread industry adoption of a set of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices in the CMAF media application format standard.

The CMAF Media Object Model allows flexible delivery and combination of CMAF Media Objects to form multimedia presentations adapted to each user, device, and network. The CMAF framework enables efficient decoding and synchronization of digital media Fragments in all major web browsers so that adaptive video streaming can be implemented in any web page.

CMAF-compliant content is efficient for encoding and delivery because each media stream can be encoded once with alternative encodings, e.g. different bitrates, then combined by "late binding" in each streaming player. Each audio or video media stream consists of Media Objects that can be cached on content delivery network edge servers and delivered via multiple unicast, multicast, and broadcast protocols. Each streaming player can splice and synchronize its choice of Media Objects to form a personalized multimedia presentation that is seamless and adaptive to network throughput and user interaction. Content creators and networks do not need to create, store, and deliver a personalized presentation for each user, which would require a huge increase in network resources to achieve the same quality of experience.

CMAF incorporates Common Encryption (CENC), which encrypts and protects the media data, but allows Media Objects to be securely parsed and processed in normal software until the point they are decrypted and decoded in a secure digital rights management component. Multiple digital rights and key management systems can be used on different devices with the same common encrypted CMAF content. CMAF specifies the use of two Common Encryption schemes to reach all device types. One is the widely deployed 'cenc' scheme using AES-128 counter mode, and the other is 'cbcs' using AES-128 block chain mode with a constant initialization vector for each Track and an encryption pattern that encrypts 10% of the video slice data to reduce decryption processing and battery consumption in mobile devices.

Program delivery options are more varied and dynamic than ever, and user operations interacting with dynamically gathered Media Objects provide the flexibility required for applications such as streaming interactive omnidirectional and virtual reality video. Video players and streaming manifest formats are outside of CMAF's scope, however CMAF specifies what is required to describe and decode a CMAF Presentation consisting of CMAF Media Objects. A sequence of CMAF Fragments encoded from a single media stream forms a CMAF Track. Every CMAF Track has an associated CMAF Header, conveying profile and parameter information to the player so it can initialize decoding and optimize rendering. CMAF specifies how CMAF Tracks in a CMAF Presentation are start aligned and synchronized on the Presentation’s timeline.

A CMAF Presentation can contain alternative Tracks with different content or encoding, such as alternative audio languages or codecs. Alternative Tracks of one media type conforming to the same Presentation are called a CMAF Selection Set.  Alternative Tracks that are different encodings of the same content and are constrained to be seamlessly switchable in a single decoder are called a CMAF Switching Set. A CMAF Presentation can be rendered by selecting one Track from each Selection Set and synchronizing rendering to the shared timeline defined by CMAF. A CMAF Presentation that offers multiple video Tracks in a Switching Set can be rendered by selecting the next CMAF Fragment from a lower bitrate Track in order to avoid buffer underrun when the video bitrate is higher than the network bitrate. Or, a higher bitrate can be selected from the Switching Set when the network bitrate exceeds the previous Fragment bitrate.

The CMAF media format and a CMAF Presentation are independent of the method(s) used for delivery, and are compatible with most adaptive streaming formats. The CMAF container format is derived from ISO Base Media File Format (ISOBMFF) and compatible with MMT, DASH and HLS adaptive streaming. Use cases supported include adaptive bitrate OTT streaming, broadcast, multicast and hybrid network live streaming, and ad insertion and signaling. CMAF specifies several CMAF Addressable Media Objects for storage and delivery of CMAF Tracks and Fragments. This includes a CMAF Track File that contains a CMAF Header and Fragments stored as a single track ISOBMFF file, and a CMAF Segment that can contain one or more CMAF Fragments. A CMAF Fragment can also contain one or more CMAF Chunks. Short duration CMAF Chunks (e.g., a few video frames) can be progressively delivered to reduce encoder output delay for low latency live streaming. CMAF is equally effective for use cases such as downloading single files and server or client-side ad insertion.

For the future, new CMAF Media Profiles are being added to support new codecs and interop points. CMAF enables efficient encoding, storage, and delivery of digital video, which is key to scaling operations to support the rapid growth of video streaming over the internet. Streaming live events with personalized ad-insertion and interactivity is expected to deliver better user experiences than broadcast TV at broadcast scale, while improving reliability and interoperability for consumers on all types of devices.

Special thanks to Kilroy Hughes, who contributed to this report.