CDVA defines video descriptors for search and retrieval applications, specifically for visual content matching in video. Visual content matching includes matching of views of large and small objects and scenes, that is robust to partial occlusions as well as changes in view point, camera parameters, and lighting conditions. The objects of interest comprise planar or non-planar, rigid or partially rigid, textured or partially textured objects, but exclude the identification of people and faces.
There are two base modes of operation of the visual search: Pairwise matching and Retrieval. Pairwise matching automatically determines if two video segments depict the same object or scene. Retrieval performs search of videos in a large database, returning a subset that depict the same objects or scenes contained in the query video or image.
The CDVA Experimental Model (CXM) implements the operations necessary to solve these two tasks. This document describes the general architecture and component modules contained in the first version of the CDVA experimental model under Consideration.