INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11
MPEG2008/N9784
May 2008, Archamps, France
Title: Introduction to 3D Video
Source: Video
Status: Approved
3D Video is a standard that targets serving a variety of 3D displays. It is the first phase of FTV (free-viewpoint TV), which is a new framework that includes a coded representation for multiview video and depth information to support the generation of high-quality intermediate views at the receiver. This enables free viewpoint functionality and view generation for auto-multiscopic displays.
Figure 1 shows an example of an FTV system that transmits multiview video with depth information. The content may be produced in a number of ways, e.g., with multi-camera setup, depth cameras or 2D/3D conversion processes. At the receiver, depth-image-based rendering could be performed to project the signal to various types of displays.

Figure 1. Example of an FTV system and data format
The first focus (phase) of standardization for FTV is 3DV (3D Video). This means video for 3D displays. Such displays here in focus present N views (e.g. N = 9) simultaneously to the user (see Figure 2). For efficiency reasons only a lower number K of views (K = 1,..,3) shall be transmitted. For those K views additional depth data shall be provided. At the receiver side the N views to be displayed are generated from the K transmitted views with depth by depth image based rendering (DIBR). This is illustrated in Figure 2.

Figure 2. Example of generating 9 outputs views (N = 9) out of 3 input views with depth (K = 3)
This application scenario imposes specific constraints such as narrow angle acquisition (< 20 degrees). Also there should be no need (cost reasons) for geometric rectification at the receiver side, meaning if any rectification is needed at all it should be performed on the input views already at the encoder side.
Some multiview displays are for example based on an LCD screens with a sheet of transparent lenses in front of it. This sheet sends different views to each eye, and so a person sees two different views, and thus enabling the person a stereoscopic viewing experience. The stereoscopic capabilities of these multiview displays are limited by the resolution of the LCD screen (currently 1920*1080). For example for a 9 view system where the cone of 9 views is 10 degrees (cone angle CA), objects are limited to +/-10% (object range OR) of the screen width to appear in front or behind the screen. Both OR and also CA will improve in time (determined by economics) as the number of pixels of the LCD screen goes up.

Figure 3. Example of lenticular auto-stereoscopic display requiring 9 views (N = 9)
Also other types of stereo displays appear now in large number on the market. The ability to generate output views at arbitrary positions at the receiver, is even attractive in the case of N = 2 (i.e. simple stereo display). If for example the material has been produced for a large cinema theater, direct usage of that stereo signal (2 fixed views) with relatively small home size 3D displays will yield a very different stereoscopic viewing experience (eg. strongly reduced depth effect). With a 3DV signal as illustrated in Figure 3, a new stereo pair can be generated which is optimized for the given 3D display.
[1] Masayuki Tanimoto,” Overview of Free Viewpoint Television, ”Signal Processing: Image Communication, vol.21, no.6, pp.454-461, July 2006.