The Moving Picture Experts Group

CDVS: Fast Searches for Massive Image Libraries

by Philip Merrill - October 2016

A new high gear for rapid searches based on image content, Compact Descriptors for Visual Search, emerged from related image-description work developed through many years in MPEG-7. Specific to CDVS, there was industry interest in standardizing a low-complexity toolkit to index large libraries of images. Speed as well as simplicity was essential to enable a tool that could scale up to efficiently manage very large collections. Private-sector firms have not rushed to join their image libraries together for interoperable search across distributed sets of images but MPEG's development, integration, and CDVS reference software assure that such searches are now feasible at acceptable costs and could become a reality.  Beyond the thousands of new images uploaded to the internet every second, mobile computing is also changing the way people interact with their visible environment and CDVS is fit for purpose to meet emerging uses in Augmented Reality and mobile-AR. Its potential is more far-reaching, however, and ultimately unknown because image search practices remain Balkanized without interoperable reach into the world's expanding image libraries.

The CDVS descriptor comes in four different sizes, ranging from a maximum of 16 KB down to 512 Bytes per image. It features an embedded global descriptor optimized for initial runs, to generate a smaller list of candidate-images that can then be compared based on the additional detail provided in the full descriptor. The CDVS description of each image is extracted based on pixel data.  Points of interest that are detected in an image reveal both their individual properties and their arrangement relative to each other; these calculations are then condensed into the compact description. Image content such as urban landmarks or industrial machine parts, have a distinctive consistency in the relative positions of their points of interest, photographed from different angles or in varying light. These and other image properties are captured by the math, assisting rapid image comparisons made by using the compact descriptor as a proxy for each matching image.

The CDVS reference software has succeeded in extensive testing. Image libraries were broken down into categories based on subject. Historic landmarks were recognized and augmented by annotations. Applicable repair manual information was triggered to augment a technician's interaction with equipment. Designed to scale up to support inspection of massive libraries too time-consuming to analyze by a more conventional approach, CDVS does this without textual metadata, but real world applications have the option to augment its pixel-derived comparisons with whatever textual metadata helps efficiently achieve application development goals.

In these tests, CDVS enabled one million images to be searched for matches in 2.5 seconds, having already extracted compact descriptors in an index of the large collection. Calculating the compact descriptor has been estimated to average 0.2 seconds. Consumers might happily confine their searches to the limited scope of walled-garden search engines, but many professionals are tasked with more demanding missions to find suitable imagery among gigabytes of stills and/or video. CDVS techniques can be applied to all frames of a video and Compact Descriptors for Video Analysis are also being standardized by MPEG, extending the CDVS approach to better handle the frames of a video depicting partially redundant content.

CDVS development broke away from a use case emphasis on broadcast or cable television delivery models, but professionals involved in video production and post-production commonly face the challenges that CDVS is designed to support. Even without commercial image libraries supporting interoperable searches across multiple libraries, broadcast and production staff generally have access to large in-house libraries with multiple databases. Better searching enables these image curators to do their jobs more effectively. In a relaxed timeframe this should result in better choices of what imagery will support news and storytelling. Under tight deadlines, having CDVS search capability could be a lifesaver. The same standardized approach scales up to consumer applications and CDVS plays an exciting role in the Augmented Reality Application Format with envisioned use cases that include tourism, shopping, and AR-based gaming.

The word "vision" is often borrowed to describe a host of cognitive imaginings but the potential to better organize large photo collections promises improved and visible results. As new developments continue to explore mobility, AR and VR in actual practice, CDVS' high gear for rapid searching will expose us to new sights and new ways to share what we see.