Dolby Publishes 3D Spec

As promised at NAB earlier this year, Dolby Laboratories has now released an open specification for broadcast 3D delivery, describing how 3D images can be encoded and carried using frame-compatible techniques through a conventional 2D broadcast infrastructure. As stated in their materials, the Dolby 3D Frame Compatible Open Standard is fully compatible with enhancement layer approaches, enabling extensibility to full-resolution 3D in the future.

Dolby’s specification incorporates the following recommendations: (1) side-by-side decimation and packing should be adopted as the method of implementing frame-compatible 3D systems, for both progressive and interlaced systems; (2) a complementary left-eye/right-eye decimation structure should be adopted, as this provides higher resolution at the screen plane in the case of 3D images, and it is also capable of providing full-resolution 2D images in the 3D SbS format without the need for receiver mode switching; and (3) if only one format is used for decimation and packing, it is not necessary to provide detailed signaling for these parameters down to the display. The image shows what Dolby means by complementary decimation.

In arriving at the spec (which carries the weight of an industry recommendation), Dolby studied the relative performance of the side-by-side (SbS), over/under (O/U), and checkerboard ("Quincunx") presentation formats. In brief, the spec recommends the use of the SbS 3D presentation for maximum compatibility with 2D systems as well as high performance for 3D encoding.

To arrive at a frame-compatible format, images must be down-sampled, so that two frames (i.e., from the left and right images) fit into the timing and bandwidth of one. Dolby tests showed that the compression efficiency for SbS (w/ MPEG-4 AVC/H.264 coding) was very similar to that of O/U. However, as a significant amount of legacy material is still produced in interlaced format, SbS is preferred, as it results in horizontal down-sampling, compared with the vertical down-sampling required for O/U, which is a more artifact-prone process when interlace is used.

Dolby also considered the two classes of horizontal decimation available for the left- and right-eye images: "common" decimation, where the resultant left and right images are formed by taking the same-parity pixels (e.g., all even) from the original left and right images, and "complementary" decimation, where the resultant left and right images are formed by taking the alternate-parity pixels (e.g., left-even, right-odd) from the respective originals. They also studied three situations: 3D half-resolution performance in SbS, 2D images encoded as SbS, and the efficiency of enhancement layer coding; for all of these, they found that complementary decimation works best.

When decimating images (or any signal, for that matter), some pre-filtering is required in order to minimize aliasing artifacts. However, because the human visual system (HVS) will fuse the left and right images, complementary decimation will provide the highest resolution (within the pre-filtering used) at the vergence point of 3D images. When 2D images are transmitted, Dolby says that it is preferable not to switch to conventional 2D processing, as many display systems cannot switch cleanly within a frame time. Therefore, Dolby recommends that 2D content be encoded as 3D SbS with complementary decimation; in fact, no pre-filtering is theoretically needed, as the HVS will fuse the images into a single, full-resolution alias-free image. This does, however, require signaling to the receiver to switch off the interpolation reconstruction filter; as this signaling is not present in current displays, some amount of pre-filtering may be desirable.

Dolby tests also showed significant overall gain in coding (enhancement) efficiency for 2D content, and for those portions of 3D content at the screen plane. Side-by-side formatting with complementary decimation therefore appears to solve a number of problems in 3D content transmission, and has the added benefit of reducing decoding hardware complexity in receivers, which should result in smaller memory size and less additional a/v sync delay. At the same time, because compression for wireless transmission may not achieve the signal quality afforded by disc media, it would be interesting to see additional data regarding the effects of lossy (perhaps even heavy) compression on the 3D image reconstruction (decoding) process.

This "specification" is really just a Dolby industry recommendation since specifications are issued by international standards bodies such as SMPTE or ISO. Dolby has made no announcement about submitting this to a standards-setting body so we must assume they have not.

Is there any Dolby IP in this? According to Dolby, "Dolby Laboratories, Inc. does not require consideration for the use of the information regarding base layer contained in this document. If Dolby Laboratories, Inc. develops an enhancement layer technology (see section 5 of this document), then a license to use such technology will be required."

As for who follows this Dolby recommendation, the value they have in the Hollywood production environment could get them a lot of clout with content distributors such as cable and satellite. 3D in broadcast ATSC is a different issue and it may be years before any significant standard is developed.

By Aldo Cugnini, Display Daily