MPEG-7 Reference Software
MPEG doc#: N 7544
Date: October 2005
MPEG-7 Part-6 reference SW: Experimentation Model (XM)
The MPEG-7 reference software operates on and generates conformant MPEG-7 bit streams. The reference software provides a specific implementation that behaves in a conformant manner. In general, other implementations that conform to ISO/IEC 15938 are possible that do not necessarily use the algorithms or the programming techniques of the reference software.
The software contained in this part is known as experimentation software (XM) and is divided into five categories:
a) Binary format for MPEG-7 (BiM). This software converts DDL (XML) based descriptions to the Binary format of MPEG-7 and vice versa.
b) DDL parser and DDL validation parser.
c) Visual descriptors. The software creates standard visual descriptions from associated (visual) media content. The techniques used for extracting descriptors are informative, and the quality and complexity of these extraction tools has not been optimized.
d) Audio descriptors. The software creates standard descriptions from associated (audio) media content. The techniques used for extracting descriptors are informative, and the quality and complexity of these extraction tools has not been optimized.
e) Multimedia description schemes. The software modules provide standard descriptions of Multimedia Description Schemes.
XM Software Architecture
The elements composing the MPEG-7 Reference Software are characterized by their functionality and by their interfaces. They can be configured according to what is referred as "Key Applications". We can distinguish from the functional point of view:
- "Extraction Applications" (a description data base is built from a media data base)
- "Search and Retrieval Applications" (a description is compared with the descriptions in a database to find the one with the lowest distance)
- "Transcoding Applications" (a media data base is converted into another media data base basing on its description)
The media database contains media files, which are supported as input file by the AV decoders. The database file which is read from a file, contains one media filename per line. From this media filename all additional input and output filenames can be derived.
The XM supports the following AV decoders:
Still image decoders: ImageMagick (Ver.4.*-5.* linked as external library, not included in the XM reference software distribution)
MPEG-1, MPEG-2 video decoders: (XM directory: Decoders/MPEG2Dec)
MPEG-1 video motion vector extractor: (XM directory: Decoders/MPEG2Dec) (It can extract images and motion vectors)
3D Objects: (XM directory: Media) (It reads a 3D object for 3D shape descriptors)
Key Points: (XM directory: Media) (It reads in a list of key points from a file).
Figure 1 - Schematic diagram of an "Extraction Application" using the XM reference software modules. In the block diagram boxes represent procedural parts, circles represent data structures.
Figure 2 - Schematic diagram of a "Search and Retrieval Application" using the XM reference software modules. In the block diagram boxes represent procedural parts, circles represent data structures.
Figure 3 - Schematic diagram of a "Transcoding Application" using the XM reference software modules. In the block diagram boxes represent procedural parts, circles represent data structures.
This is the internal XM representation of the raw media data (one class with different structures depending on the media content type).
Extraction tools are specific extraction methods defined for each Descriptor and Description Scheme. All these source file are available in the ExtractionUtilities XM directory. Extraction tools are not normative in the implementation, but they must provide a valid description. The extraction tools extract the descriptions from media data. Because media data can be very large, the extraction is performed on time entities of the media, i.e., if the media is a video the extraction is done frame by frame.
Descriptors (Ds) and Description Schemes (DSs)
These modules implement the data structure of normative Descriptors and Description Schemes. Low level Video Descriptors are using a dedicated C++ class. These classes provide methods to access the elements of the normative descriptions. The GenericDS class does not implement the data structure in a dedicated way, but it is an interface to the XML parser library which controls the memory for the tree structure of the instantiated D or DS.
Coding Schemes (CSs)
Coding Schemes are specific coding and decoding methods defined for individual Descriptors and Description Schemes. If an individual coding schemes is available, it represents a normative part of the standard. Coding schemes are available for the visual descriptors to encode or to decode a description into its binary representation. Coding schemes are not available for Ds and DSs which are implemented using the GenericDS class.
Search & Matching Tools
Matching tools are specific search, or matching methods defined for each Descriptor and Description Scheme. Matching tools are not normative in the implementation but they are depending on the specified application of the description. The matching tools can be used in two different ways: to compute distances between descriptions for the purpose of indexing, and to search in the descriptions based on a query with the purpose of transcoding.
These procedural blocks are part of the functionality of specific application modules. They are not represented by dedicated module classes in the XM software. They need to be integrated in the XM when implementing a specific transcoding application.
Applications are expressed by the classes combining the modules of a Descriptor or a Descriptions Scheme including modules of their sub-Ds and -DSs. The resulting class implements one of the three key applications shown in Figure 1, 2, 3. Applications creating a database of the descriptor or description scheme under test (DUT / DSUT), which are of the Extraction Application type, are called Server Applications. Applications using the DSUT data base (Search & Retrieval and Transcoding) are called Client Applications.
The components of the reference software, which are corresponding to a descriptor or description scheme, are implemented using a specific interface mechanism. Besides using private and public functions all classes have an individual interface class, which interfaces the public methods of the class itself. This is done to increase the reusability of the code. For example, by making all destructors private it is possible to force a dedicated way of instantiating objects of this class. Thus, in case of code reuse, the way of destructing the object is fixed. Furthermore, the interface function has a pure virtual representation by its InterfaceABC class (ABC = Abstract Base Class), which is always used to access the elements of the classes mentioned in the previous sections.
For the reuse of classes two mechanisms are implemented. In case of descriptor class implemented with a C++ class (i.e., for Visual descriptors) not only the data structure with its methods can be reused, but also the description data itself (e.g., multiple visual color description share the same Color Space description). In such case a reference counting mechanism is implemented. This is not required for the coding scheme classes, the extraction classes, and the search classes. Therefore, these classes do not use a reference counting mechanism. These classes use the reference counting mechanism of the descriptor class to manage the memory of the description data.