The Moving Picture Experts Group

3D Audio Tests Confirm "Excellent" Quality Achieved

by Philip Merrill - March 2017

The January 2017 report on MPEG-H 3D Audio Verification Test was welcome because at the end of the standardization process, it is reassuring that fresh tests confirm the effectiveness one believed had been designed into the technology. With a broadcast orientation overall, Low Complexity Profile encoded audio signals were measured at ITU-R High-Quality Emission level in Test 1 Ultra-HD Broadcast and also measured for three other ranges, Tests 2-4, against the subjective "golden ears" MUSHRA scale.

Scope & results: The scope of testing incorporated results from seven sites with 288 listeners. The Ultra-HD Broadcast test presented 22.2 or 7.1+4H channel sound encoded at 768 kb/s. The other three tests scored "Excellent" for their MUSHRA results, addressing thick and thin varieties of broadcast signaling and mobile use on headphones.

Test 3 High Efficiency Broadcast: The number 48 kb/s is a significant lower bound conveying 2 channels in stereo for the listening tests in Test 3. High Efficiency as a goal means low-quality in the sense of having less information to work with so the challenge is to match acute properties of human hearing and be good enough for golden ears. The reference and its repetition in the MUSHRA test provides a nice example because it eliminates respondents who do not or are unable to pay attention to what they are listening to. If one cannot match the reference to its repetition, one is not listening with golden ears. Presentation of these Test 3 signals scaled up to 8 channels in a 5.1+2H configuration at 256 kb/s. The 8-channel and HOA were both presented at 144, 192, and 256 kb/s. A 5.1 6-channel was presented at 128, 144, and 180 kb/s and stereo 2-channel at 48, 64, and 80 kb/s, with the 2.0 (stereo) material using popular music songs and sounds of a hockey game. This grid exemplifies the granularity of measurement and also the challenging testing circumstances of low bitrate. Achieving an "Excellent" score means more arguably, when it has to be accomplished with less.

Tests 2 & 4 HD Broadcast or A/V Streaming and Mobile: Some interest attaches to the use of the same test listening material for both Tests 2 and 4 although certainly the end result of Test 4 Mobile (headphones) can be considered to have fewer channels, but only at the end of the signal's reduction to two channels. The bitrates were alternatively 256, 384, and 512 kb/s. Higher Order Ambisonics items tested immersiveness against recorded signals for a capella singing, guitars, female voice with orchestra and piano, and a drama. Various musical, natural, sports and cinematic events were presented in 5.1+2H and 7.1+4H. Test 2 was considered HD Broadcast or A/V Streaming quality, presenting through loudspeakers, and is quite a contrast in terms of delivery from listening on headphones. The binaural renderer presents the same files for headphone listeners in the Mobile use case, encoded at 384 kb/s. This identity whether in a home theater setting or listening privately on headphones delivers Excellent immersive quality in a wide range of settings on a wide range of device types.

More on Test 1 Ultra-HD Broadcast: Returning to the Ultra-HD Broadcast model of Test 1 coded at 768 kb/s, it was measured against the broadcast rating for ITU-R High-Quality Emission and achieved it, which is essentially to say that the audio was of CD quality. The bitrate is high enough for top quality while being low enough to fit within fairly conventional ranges for delivery. The loudspeaker configuration is on the high end with either 7.1+4H or 22.2, which is to say either 12 or 24 independent loudspeakers. The dozen test items exemplified the more progressive end of what 3D Audio can do, including combinations of objects and HOA and a car race with commentaries in three different languages, which switch between themselves while the item also manages three audio objects. Once again there are a variety of musical, natural and theatrical settings such as Funk, Swan Lake, a dragon cave fighting scene with score, ambience with birds, and rain with steps.

While it is gratifying to see the built-in qualities from 3D Audio's Requirements properly substantiated within the standard and the tested items, these technologies are there so people coming to the field fresh will innovate, taking for granted that the format will support their unpredictable creative choices. As network advances roll out such as 5G and ATSC 3.0, the nature of creativity in mainstream media will expand to explore both 3D possibilities and increasingly dynamic interactive experiences.