What is the MPEG-4 model of an audio-visual object?

In the MPEG-4 model, audio-visual objects have both a spatial and a temporal extent. Temporally, all AV objects have a single dimension. Each AV object has a local coordinate system in which the object has a fixed spatio-temporal location and scale. AV objects are positioned in a scene by specifying one or more coordinate transformations from the object's local coordinate system into a common, global coordinate system, or scene coordinate system. An audio-visual object in a BIFS scene is usually represented by one BIFS node or a sub-tree of the BIFS scene graph