NYIT’s Impact on High-Quality 3D Video Distribution
The New York Institute of Technology (NYIT) is known for its pioneering role in computer graphics and for the large number of alumni that eventually settled at Pixar and other Silicon Valley graphics companies. However, NYIT was also the birthplace of some novel processing techniques that could be the heart of future 3D video storage and distribution.
Back in the late 1980’s, a number of their researchers were investigating ways to squeeze more video information into the available bandwidth. To do this, they wanted to determine how humans perceive video data; that way, they might be able to figure out how to reduce the amount of information in a video stream without affecting the perceived image quality.
It turns out that the human visual system (the eye, combined with the brain) perceives video detail quite differently depending on the orientation of the detail. If there is fine detail in the horizontal direction, the human visual system is very good at detecting it. Likewise, humans can also perceive fine detail in the vertical direction. However, if the video detail is off-axis, in the diagonal direction, our sensitivity to detail drops dramatically.
The chart shown below (on the left) illustrates that relationship. The entire chart represents all of the spatial frequencies (or “detail” in different directions) that are contained in a video. The yellow line shows our sensitivity to detail in the different directions. At the top right portion of the chart, where the detail is primarily in the diagonal direction, you can see that our sensitivity drops dramatically.
Perhaps evolution had a hand in this: if you are hunting for your dinner, or a carnivore is hunting you, having better image perception in the horizontal and vertical directions might be a distinct advantage in picking out something moving along the horizon or up in a tree. For whatever reason, the impact of this relationship is significant. This shows that, at a reasonable distance from the screen, a viewer’s overall perception of video quality is based on less than half of the detail actually stored in the video. Consequently, if we can remove some of the diagonal detail without reducing the horizontal and vertical detail, we can reduce the bandwidth requirements without affecting the viewer’s perception of the video quality.
This “HVS” technique, so-called since it is based on the characteristics of the Human Visual System, consists of first removing the spatial frequencies in the diagonal direction via a 2-dimensional low-pass filter (also known as a diamond filter). Once this is complete, we can comfortably decimate the image in a quincunx pattern (basically, a checkerboard pattern) without affecting the remaining information.
The effect of this technique is graphically illustrated in the chart shown above (on the right); the HVS filter removes many of the diagonal spatial frequencies that humans just don’t see very well. In fact, we are able to remove fully 1/2 of the video data without significantly impacting the perceived image quality. This frees up enough bandwidth to fit a second video stream in the same amount of space; which is exactly what we need to store and transmit a 3D-pair of high-def videos.
I have personally demonstrated this technique to a variety of video experts, most recently at the fmx/09 conference in Germany. I displayed two identical high-def images side-by-side, with one of the two images being HVS-filtered to remove the diagonal detail. Even on a ~5 meter screen at 1-2 screen heights viewing distance, video experts were unable to see any significant difference between the two images. And when asked to guess which one had been filtered, 1/2 of the participants guessed wrong! Considering that most viewers watch their HDTVs at a distance of 3-6 screen heights, this distribution format will not affect consumers.
This is a great advantage for video providers that want to aggressively pursue the 3D market but can’t afford to upgrade their infrastructure or increase their bandwidth. For example, a PC-download website could encode 3D movies using this technique and then supply viewers with existing 3D playback software to decode and display the movie using a wide range 3D display technologies. Cable/satellite operators may already have the capability to implement this decoder in their existing hardware, with only a firmware update. If so, they could test market 3D movies to early adopters using their existing pay-per-view channel infrastructure. This could be a relatively simple way to extract a premium for 3D movie content.
This is not just theory: some of the leading companies in 3D encoding and distribution have publicly discussed and demonstrated similar, but proprietary, techniques for their home distribution solutions.
Finally, using the characteristics of the Human Visual System to intelligently compress video data is not uncommon and certainly not unique to 3D. In 1947, RCA demonstrated a color television that implemented Alda Bedford’s principle for chroma subsampling. Chroma subsampling takes advantage of the human eye’s poor ability to perceive color detail and its various “flavors” are pervasive throughout all of today’s video applications. For example, 4:2:2 chroma subsampling removes 50% of the color information, while 4:2:0 subsampling removes 75% of the color information. And these standards are commonly used in all home systems, including Blu-Ray! But these are perfectly reasonable design trades: by throwing away information that the eye can’t detect, more bandwidth is left to minimize compression artifacts, add special features to the disc or, in this particular case, add an additional 3D image to a video stream.
So what is needed to implement this technique? Well, you need a good diamond filter to minimize any artifacts, a versatile video processing program to pack the data properly, and a good way to play back the stereographic video for your particular 3D display. I’ll leave these, and other issues, for a future post.
For a detailed discussion of this topic direct from the NYIT researchers themselves, check out the SMPTE paper by Robert L. Dhein and Irwin C. Abrahams, “Using the 2-D Spectrum to Compress Television Bandwidth” presented at the 132nd SMPTE Technical Conference, October 12-17, 1990.
By Keith Elliott, Beyond the Screen’s Edge