The Moving Picture Experts Group

ARAF Brings the Scene to Life

by Philip Merrill - August 2016

Augmented Reality Application Format contains and streams resource information that can bring alive AR scenes and packages that augment images from our daily life with virtual add-ons. Supported by ARAF-browser software as well as online education on modifying applicable XML, this formal model allows authoring now of AR applications that rely on its interoperable description of scenes and assets as well as its management of packetized transport. In fact the date of the first online Coursera course on developing ARAF applications is September 2016.

A tourist can look around, take a photo, learn more about local landmarks, see where they are from above, and meanwhile chat remotely about all this with a friend using a tablet in a bedroom miles away. That friend might also be running textual and visual searches and browsing through additional info, while chatting casually and sharing digital media back-and-forth. While the metadata-mediated details recognized by the ARAF-browser function in the background, these users can enjoy pointing and selecting items represented by visually appealing and easy to use interfaces, happily ignorant of the underlying mechanisms.

ARAF was tasked to support a wide array of use cases including different media types, different source locations for media, the need for real-time streaming capability, and support for both 2D and 3D rendered experiences. To do this MPEG's integration team assembled BIFS (Binary Information For Scenes, which contains a VRML subset), MPEG-V, MPEG-U and CDVS. In other words, many of the pieces needed were already in place to satisfy ARAF's use cases and requirements, using the general approach to integration of previous work that typifies MPEG-A efforts. Based on the way ARAF and other MPEG-A projects have come together, there is some confidence various other challenges could be met by reassembling pieces MPEG already standardized. In a recent survey of the industry, MPEG has been exploring interest in other AR/VR integration paths that might be helpful.

As for BIFS and VRML, ARAF disappoints any pessimists who believed they had heard of those acronyms for the last time. Virtual Reality modeling is not new and has conventionally been a major component of PC gaming or what used to be called "videogames." Arguably smartphone displays and processors have lowered manufacturing costs and sparked increased interest. VR or less-than-immersive Mixed Reality productions have already become a conventional part of promotional media for movies and television. BIFS original superset of VRML and more has been updated by many upgrades to MPEG-4's 3D modeling, for example support for Scalable complexity mesh coding in 2011. MPEG-4 also has been updated for interoperability with other 3D modeling standards such as X3D or Collada XML schema. BIFS original design associates content with its signal stream transport, which leverages nicely into the contemporary streaming-intensive environment, and it also offers fully integrated audio support.

MPEG-V formalizes the description of sensors and actuators in AR/VR that are an essential part of interactive applications and/or provide real time information such as environmental or medical data. Metaphorically the distinction resembles the two types of nerve cells — one is going and the other is coming, either receiving or sending a signal. In applications the possibilities multiply, since sensors and actuators can be used together or independently, to enhance digitally mediated experiences and to enable new types of experiences. If users go beyond the one-world-at-a-time approach then transitioning between virtual worlds requires an interoperable description such as MPEG-V, for example maintaining a personal avatar or a profile's distinctive characteristics across multiple apps. MPEG-V has also been designed to integrate with MPEG-U interface descriptions and it has been updated to work with GPS and biosensors.

MPEG-U supports user interactivity in a manner that is informed by years of MPEG-7 and MPEG-21 work so that the user interface permitting user metadata to be sent is adaptable and supports an appropriately wide range of user-interaction scenarios. The metadata management task permits users to configure wide ranges of settings which are stored and then sendable in reply to requests, for example how search results should be displayed or using the same application while going from living room to garage to automobile seat with the display adapting from flat-screen television to smartphone to the car dashboard's digital network interface. MPEG-U information governance supports software widgets interacting with remote widgets, new types of sensor data such as hand gestures and patterns, and coordinated results between user interactions with the widget and subsequent actions, for example using a hand gesture to signal "rewind" to a video player.

CDVS should be uniquely practical for new ARAF experiences because it is designed to perform with very low complexity. This benefit can be implemented with lower costs resulting in scaled-up support for visual match-and-detect comparisons of massive image libraries. This is done in-house by major industries including search engines using both patented and "secret sauce" technologies. CDVS provides an interoperable toolkit that is ready for the job and can scale up, for example searching multiple search engines at the same time, while a tourist takes snapshots and walks by historic streets. CDVS experts advocate that pixels are the new hyperlink, because of their power to feed searches, and also that Internet searching has become a digital sixth sense, one that the increasing use of massive image libraries will help to evolve.

Thriller action films often feature interactive multimedia maps with annotations, windows and dashboards. On some level of assets and their formalization, this is not dissimilar from cities' many tourists navigating strange streets via apps. The description of what is available and in view, or synthetic or moving and composited as additional information, all must be delivered and processed effectively. There are many ways to do this but there is also ARAF's standardized way, easily accessible with existing ARAF-browser software and usable for application development now.

Go to the MPEG news page