To understand how time-of-flight (or TOF) cameras acquire a depth image, its Wikipedia page is a good place to start. The basic idea is that you need to measure the round-trip time (RTT) of the photons that are emitted by the sensor and reflected back. Given the speed of light being 3x10^8 m/s, the sensor would need the precision to measure 6 picoseconds difference in time to measure 1 millimeter difference in depth (1mm is the best precision the original Kinect could achieve). This high precision requirement, i.e. high frequency RF, makes the chip and circuitry design more challenging and costly. The surprise for me is that 3DV Systems (and Canesta) seemed to have found a way to lower the cost drastically and planned to release a RGB-Depth sensor, called ZCam, for under $100. (But before 3DV could sell it, the company was bought by Microsoft.)
Let's venture deeper into the hardware. Thanks for WIRED Magazine's exclusive report, we can see the performance and internals of the new Kinect sensor.
We can see from the picture of the circuit board that the three crucial components, RGB camera, infrared (IR) sensor, and IR illuminator, not unlike the original Kinect except that the IR sensor and illuminator are no longer exposed.
|The external of the new Kinect sensor unveiled with Xbox One|
|Photo of the internal of a new Kinect from WIRED with labels added by me|
|Photo of an original Kinect sensor with components labeled.|
Thus the IR sensor and RGB camera are still separated (also confirmed by the different field of view when switching between the two streams in the WIRED video: http://youtu.be/Hi5kMNfgDS4?t=5m27s). If the sensor can capture both RGB and IR simultaneously from the same sensor (or just switching quickly between the two at 60Hz/60Hz or 30Hz/30Hz), texture mapping alignment in KinectFusion type of application would get better.
In case you wonder, the reason why I argue that the IR illuminator is the big rectangular block to the left of the IR sensor (instead of the other way around) is because the active IR image shows shadow due to IR illumination on the left but the screen we see is a mirror image: http://youtu.be/Hi5kMNfgDS4?t=5m8s. This shows that the IR illuminator is on the left to the IR sensor (which by the way should have a lens like the exposed RGB camera).
Comparing to the original Kinect, the most noticeable improvement of the new Kinect is the almost shadow-less depth image (see video http://youtu.be/Hi5kMNfgDS4?t=37s) due to the closer placement of IR sensor and illuminator. In fact, TOF technology allows for more flexibility with the illumination placement and design (thus better depth image). For comparison, google "kinect shadow" and take a look at the images.
Looking forward, the natural question to ask is when will depth sensing technology go mobile? Canesta seems to have some prototype for product can fit into the form factor of a phone: https://www.youtube.com/watch?v=5_PVx1NbUZQ and the video is two years old.
The trail that led to this new design of Kinect sensor is clear in retrospect. Microsoft acquired 3DV Systems and Canesta a few years ago which have both worked on TOF technologies extensively. The acquisitions obviously clear up some patent concerns for Microsoft (Bye, PrimeSense...). The down side is that we, as developers and consumers, might not see a more open-source friendly alternative with similar technology anytime soon. And we would have to rely on Microsoft to release good SDK and live with Windows when using the new Kinect in commercial applications.