To capture physical environments human beings have sensors like eyes and ears. Whereas ears are acoustic omnidirectional sensors, the eye is a directional visual sensor, which collects lightNrays from one direction (head and eyeball position and rotation) and focus to a specific distance (lens). The visual dynamic range in real world is so high, that it has to be limited by the iris to avoid overloading. By processing the sensor signals from either both ears or both eyes additional information can be generated, like the direction of the sound from a sound source or the distance of an object based by stereoscopic visual analysis.
Traditional digital audiovisual representations work closely to the information captured by the human audiovisual system. However this information is only a sparse sampling of the real world at the position of the human being limited by the human sensors in quality and quantity.
With the advent of better sensors, displays and higher computing capabilities the generation and use of virtual environments are becoming more and more interesting (for games, for business applications, for teaching). These virtual worlds can be completely computer generated (CGI) or captured, processed from real world and represented to the human being (virtual reality) or overlaid to the real world (augmented reality). For complete moving in these virtual environments sound and light information at all positions and from all directions are necessary to represent audiovisual samples to the human audiovisual system.
In this report we explain the state of the art of capturing and generation of audiovisual information and their reuse in virtual environments and for improved immersive audiovisual experiences. It shall identify workflows, technologies and