3-D Cameras for Cellphones

Researchers at Massachusetts Institute of Technology (MIT) have developed a system that uses specially designed algorithms to produce a detailed 3D image with just a cheap photodetector and the processor power found in a smartphone.

Like other sophisticated depth-sensing devices, CoDAC uses the “time of flight” of light particles to gauge depth: A pulse of infrared laser light is fired at a scene, and the camera measures the time it takes the light to return from objects at different distances.

Traditional time-of-flight systems use one of two approaches to build up a “depth map” of a scene. LIDAR (for LIght Detection And Ranging) uses a scanning laser beam that fires a series of pulses, each corresponding to a point in a grid, and separately measures their time of return. But that makes data acquisition slower, and it requires a mechanical system to continually redirect the laser.

The alternative, employed by so-called time-of-flight cameras, is to illuminate the whole scene with laser pulses and use a bank of sensors to register the returned light. But sensors able to distinguish small groups of light particles — photons — are expensive: A typical time-of-flight camera costs thousands of dollars.

The MIT researchers’ system, by contrast, uses only a single light detector — a one-pixel camera. But by using some clever mathematical tricks, it can get away with firing the laser a limited number of times.

The first trick is a common one in the field of compressed sensing: The light emitted by the laser passes through a series of randomly generated patterns of light and dark squares, like irregular checkerboards. Remarkably, this provides enough information that algorithms can reconstruct a two-dimensional visual image from the light intensities measured by a single pixel.

In experiments, the researchers found that the number of laser flashes — and, roughly, the number of checkerboard patterns — that they needed to build an adequate depth map was about 5 percent of the number of pixels in the final image. A LIDAR system, by contrast, would need to send out a separate laser pulse for every pixel.

To add the crucial third dimension to the depth map, the researchers use another technique, called parametric signal processing. Essentially, they assume that all of the surfaces in the scene, however they’re oriented toward the camera, are flat planes. Although that’s not strictly true, the mathematics of light bouncing off flat planes is much simpler than that of light bouncing off curved surfaces. The researchers’ parametric algorithm fits the information about returning light to the flat-plane model that best fits it, creating a very accurate depth map from a minimum of visual information.


Click to watch the video


Indeed, the algorithm lets the researchers get away with relatively crude hardware. Their system measures the time of flight of photons using a cheap photodetector and an ordinary analog-to-digital converter — an off-the-shelf component already found in all cellphones. The sensor takes about 0.7 nanoseconds to register a change to its input.

That’s enough time for light to travel 21 centimeters, Vivek Goyal from MIT’s Research Lab of Electronics, says. “So for an interval of depth of 10 and a half centimeters — I’m dividing by two because light has to go back and forth — all the information is getting blurred together”.

Because of the parametric algorithm, however, the researchers’ system can distinguish objects that are only two millimeters apart in depth. “It doesn’t look like you could possibly get so much information out of this signal when it’s blurred together,” Goyal says.

The researchers’ algorithm is also simple enough to run on the type of processor ordinarily found in a smartphone. To interpret the data provided by the Kinect, by contrast, the Xbox requires the extra processing power of a graphics-processing unit, or GPU, a powerful special-purpose piece of hardware.

“This is a brand-new way of acquiring depth information,” says Yue M. Lu, an assistant professor of electrical engineering at Harvard University. “It’s a very clever way of getting this information.” One obstacle to deployment of the system in a handheld device, Lu speculates, could be the difficulty of emitting light pulses of adequate intensity without draining the battery.

But the light intensity required to get accurate depth readings is proportional to the distance of the objects in the scene, Goyal explains, and the applications most likely to be useful on a portable device — such as gestural interfaces — deal with nearby objects. Moreover, he explains, the researchers’ system makes an initial estimate of objects’ distance and adjusts the intensity of subsequent light pulses accordingly.

Telecoms company Qualcomm has awarded the research team one of $100,000 Innovation Fellowship grants to continue the research.

By Larry Hardesty, Massachusetts Institute of Technology