by Philip Merrill - September 2016
The flexibility of MPEG 3D Audio called for a new approach to dynamic range and loudness control, able to adjust to the standard's wide range of scenarios. Beyond meeting that need, MPEG-D Dynamic Range Control (DRC) illustrates the freedom and power of metadata, disengaging this key information from reliance on any one audio compression technology.
Unlike digital audio compression, dynamic range compression and loudness control shape the signal's waveform directly. Sound has fluctuating loudness, and time-varying gains, which can be represented as metadata, can use thousands of values per second to accomplish dynamic range compression. Sound samples can be treated as combinations of underlying frequencies stacked together in a sum. Those sound samples can be separately addressed and modified for a desired dynamic range or loudness and they can also be independently controlled for each audio channel. Content that can demand independent treatment includes dialog or narration, in contrast to soundtrack music, and also special effects noises. Examples of environments where DRC can enhance the listening experience are listening in a car or subway or a mall where there is a high noise floor. DRC boosts the lowest level audio segments so that all of the music can be heard. Loudness control can alleviate the problem of advertising blaring louder than program content, which has resulted in new regulatory requirements.
DRC controls these adjustments by means of metadata. These gain-control values can be applied in various ways, for example by employing dynamic range compression especially suited for certain listening scenarios or else by searching for a best match from a range of options. The decoupling of the metadata from audio coding technologies is also liberating in terms of transport, and DRC information can be used with any ISO Base Media File format audio stream.
Another feature in its toolkit is DRC's integrated consideration of whether its advanced peak and clipping control should be used. The need for this is inherent when MPEG-H 3D Audio combines many audio channels into a downmix, one example is being a binaural headphone downmix. If adjustments that increase loudness are made while these channel components are in independent form, it introduces the risk their sum might generate amplitudes that exceed the maximum. Unlike a spike bursting through the wrapping of a package, what goes out of range disappears and what is left behind — the packages' edges of torn wrapping — is generally considered painful to hear. Some engineers have suggested digital audio saturation and clipping should win a worst artifact award, so DRC's advanced peak and clipping control remains at the ready to correct for this should it occur.
A program's author or publisher enjoys the initial freedom to sculpt the soundscape for artistic effect or the audience's pleasure. While multiplatform consumption is now commonplace, many commercial titles only provide a single mix, potentially rendering audio assets inaudible as they are consumed in various environments. Tablet and smartphone viewing are now popular but most mixes are optimized for home theater in a quiet environment, so DRC can adjust frequency bands or assets to compensate for noisy environments. Less commonly considered but built into MPEG-H 3D Audio is how an individual's unique head affects the headphone listening experience. Also, individual hearing loss degrades audibility along different frequency bands based on differential loss of function along the hair cells in the cochlea. Range adjustments can compensate for this based on each user's unique hearing function by configuring a complementary metadata profile for the user.
Spatial audio scenes used in many games, in virtual reality, and provided for in MPEG-H 3D Audio suggest imaginative uses that are still in the early stages of exploration. Compensating and correcting for unique environments and user preferences should optimize users' pleasure and engagement and can be directed at specified frequency ranges, channels or assets. While virtual play in first-person shooter games might spark social disapproval, everyone can agree the experience will be improved by adjusting the sound of shots to a user's personal preference. In family home theater environments, parents should be able to tell their teenagers to "turn that down" for the sounds they actually find annoying instead of requiring a crude adjustment to overall loudness.
At the quiet extreme, where privacy and peacefulness are at a premium, DRC adjustments offer deeper engagement for late-night listening. Both productivity and pleasure can be optimized, addressing both the listener's needs as well as people nearby trying to sleep. A person meditating to a favorite album of new age music will be able to better optimize their consciousness if bells, flutes, and drums "sound right" to them in their unique setting.
Users up and down the value-chain now live in a world that presents more choices than can be made in a lifetime. The freedom of configurable options can be confusing, but MPEG's role is to provide substantial solutions to real needs. Reference software is under development so program creators, developers, and manufacturers will be able to better assess the desirability of building DRC capability into their projects and products. By taking advantage of the freedom of metadata, MPEG's DRC solution is a transport-friendly solution in terms of bitrate. Although MPEG-H 3D Audio fully integrates MPEG-D DRC, this particular solution is designed to be universally applicable so that the benefits of dynamic range and loudness control can someday be taken for granted as part of daily convenience and its available choices.
Go to the MPEG news page