CDVA defines video descriptors for search and retrieval applications, specifically for visual content matching in video. Visual content matching includes matching of views of large and small objects and scenes, that is robust to partial occlusions as well as changes in view point, camera parameters, and lighting conditions. The objects of interest comprise planar or non-planar, rigid or partially rigid, textured or partially textured objects, but exclude the identification of people and faces.
There are two base modes of operation of the visual search: Pairwise matching and Retrieval. Pairwise matching automatically determines if two video segments depict the same object or scene. Retrieval performs search of videos in a large database, returning a subset that depict the same objects or scenes contained in the query video or image.
The CDVA Experimental Model (CXM) implements the operations necessary to solve these two tasks. This document describes the general architecture and component modules contained in the CDVA experimental model.
This document describes the architecture of CXM2. CXM2 will be released in two software versions:
• CXM2.0 includes the temporal encoding of CDVS global and local descriptors, and is released together with this document.
• CXM2.1 adds extraction, matching and retrieval of deep features (NIP) descriptors. It is expected to be released in September 2017.