The Moving Picture Experts Group

The Metadata-First Approach Of Compact Descriptors For Visual Analysis

by Philip Merrill, July 2019

MPEG’s Compact Descriptors for Visual Analysis (CDVA) standard enables efficient and interoperable use of video feature descriptors, supporting applications as varied as live sporting events, location-based augmented reality. CDVA is designed to be interoperable with Compact Descriptors for Visual Search (CDVS) by taking what works for still images and adding new analysis capabilities for video, such as automated reference-frame selection from a segment's series of still pictures in order to exploit their similarity along the timeline.

Although automatically generated descriptors can be generated or stored anywhere, CDVA enables a new model for deriving feature descriptors at the location of the camera, where the video data is already stored in uncompressed form. This turns the conventional model of compression-first on its head, which starts with capture followed by transmission, decompression and finally analysis. By providing analysis at the location of the video capture, these efficiently small-data descriptors can be transmitted and aggregated centrally as proxies for the full video, which can be stored wherever advantageous on the network. These proxies are ultracompact, designed to enable rapid massive searches, as are the still image descriptors of CDVS. A wide range of new applications can be supported by this quick-turnaround analysis technology, for example time-critical automotive uses or navigating social video shot at live events in near real time.

CDVA used the CDVS global descriptor for quick initial searches. Keyframes are selected within CDVA video segments, to center the description automatically. The standout new capability with CDVA is the use of neural network based descriptors. By aggregating descriptors generated at many cameras' locations, a networked universe of captured raw video can be indexed, searched and repurposed.

The reference software provides the complete functionality to extract and match CDVA descriptors and to perform retrieval. The software is provided as a multi-platform C++ implementation, and is complemented by Docker containers to facilitate the deployment process. For conformance testing of proprietary implementations of the standard, a dataset provided under Creative Commons license is available.

CDVA's greatest contribution in its area, like other MPEG standards, is the interoperability that encourages stakeholders to engage in joint data operations. While CDVA is ready for purpose for a broadcaster to manage their archives, it is also designed for even more massive searches and analysis challenges. The video of all broadcasters could also be pooled and searched, based solely on the transmission of its efficiently sized metadata. Neural network-enabled analysis has great potential, enabling rapid and extensive analysis as massive video archives and mixed reality lifestyles continue to grow.