Google has published an 18-page study fully detailing the synthetic depth-of-field technology that makes its single-camera Portrait Mode possible. Google introduced its evolved Portrait Mode feature on the Pixel 2 and Pixel 2 XL, though neither smartphone model has the dual-camera hardware typically required to produce this effect.

The in-depth paper shows a degree of openness unusual for the smartphone and camera industries. Smartphones with a single camera produce images where everything is generally in focus. Dual-camera phones paired with a stereo algorithm get around this limitation by matching points in images from both cameras to determine depth within the captured scene. Having acquired that depth data, some pixels can be selectively blurred to produce the shallow DOF effect, Google explained in a blog post last year.

Achieving this same effect using only a single camera is difficult. Some mobile camera apps attempt to simulate a shallow DOF by separating an image's pixels into two layers, isolating the foreground, and then blurring the remaining pixels; this is called semantic segmentation. The lack of depth data, however, means the software doesn't know how much blur to apply to any arbitrary object in the scene. The results can often be lackluster or unrealistic, without the gradual optical blur expected of objects receding into the distance.

That's where Google's "authentic defocus" technology comes in. The Pixel 2 smartphones utilize the semantic segmentation method for images taken with the front-facing camera, but they also use a stereo algorithm for images taken with the rear camera... despite there only being a single lens. Google provided an overview of how it achieves that on its AI blog in October.

There are advantages to Google's technology versus using a second camera, including reducing the space taken up by the imaging module, reduced power consumption, and helping keep costs down.

Put simply, Google repurposes its dual-pixel auto focus hardware utilized increasingly in mobile cameras for fast AF. Each pixel on the sensor is split into two photodiodes; the left- and right-looking (or up- and down-looking) photodiodes essentially establish two perspectives of the scene with a ~1mm stereo baseline. A burst of images are aligned and averaged to reduce noise, and a stereo algorithm computes a depth map from the two perspectives. This simulates the data that would be provided by two physical cameras next to each other, enabling Google's software to determine the depth of every point within the captured scene.

There's a lot more to Google's approach, including even advantages over traditional optics - for example in its choice to force a larger depth-of-field around the focus plane to ensure a sharp subject, something impossible to achieve optically. The study also points out that there are advantages to Google's technology versus using a second camera, including reducing the space taken up by the imaging module, reduced power consumption, and helping keep costs down.

Read the full PDF here.

Via: Cornell University Library