INTERNATIONAL
ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE
DE NORMALISATION
ISO/IEC JTC 1/SC 29/WG
11
CODING OF MOVING PICTURES AND
AUDIO
ISO/IEC
JTC 1/SC 29/WG 11 N7708
Nice, FR
– October 2005
|
Source: |
Leonardo Chiariglione |
|
Title: |
Description of MPEG-7 Audio Low Level Descriptors |
|
Status: |
Approved |
1. Introduction
The MPEG-7 Audio standard contains description tools for audio describing content. The extraction of low level descriptors is normative and is based on the audio signal itself. With the help of low level descriptors it is possible to search and filter audio content in regard to for e.g. spectrum, harmony, timbre and melody.
2. Low Level Tools
The low-level audio descriptors (LLD) are useful in describing audio. There are seventeen temporal and spectral parameters that can be divided into following groups:
An LLD can be instantiated as a single value for an audio segment or as series. In MPEG-7 Audio exist therefore two different LLD Types. AudioLLDScalarType is useful for scalar values as power or fundamental frequency. AndioLLDVectorType can be used for vector types as spectra. Any descriptor that is inherited from one of the two types can be instantiated.
The samples can be further manipulated using ScalableSeries. ScalableSeries allow a downsampling of the data. They are able to store various kinds of summaries such as minimum, maximum, mean, variance…
All low level audio descriptors are based on either AudioLLDScalarType or AndioLLDVectorType. It has been defined a default sampling period of 10 ms within this types (the hopSize). All descriptors should take this hopSize or an integer multiple of it. 10 ms have been chosen to maximize the compatibility with common audio sampling frequencies.
The next figure shows the class hierarchy for MPEG-7 low-level audio descriptors.

Figure 1: class hierarchy of MPEG-7 Audio Low Level Descriptors
3. Applications
It has been often said that MPEG-7 will make the web more searchable for multimedia content than it is for text today. This would also apply to making large content archives accessible to the public (or to enable people to identify content to buy). The same information used for content retrieval may also be used by agents, for selection and filtering of broadcast or "push" material. Additionally, the meta-data may be used for more advanced access to the underlying data, by enabling automatic or semi-automatic multimedia presentation or editing.
4. References