With the launch of the Google Pixel 3, smartphone cameras have taken yet another leap in capability. I had the opportunity to sit down with Isaac Reynolds, Product Manager for Camera on Pixel, and Marc Levoy, Distinguished Engineer and Computational Photography Lead at Google, to learn more about the technology behind the new camera in the Pixel 3.

One of the first things you might notice about the Pixel 3 is the single rear camera. At a time when we're seeing companies add dual, triple, even quad-camera setups, one main camera seems at first an odd choice.

But after speaking to Marc and Isaac I think that the Pixel camera team is taking the correct approach – at least for now. Any technology that makes a single camera better will make multiple cameras in future models that much better, and we've seen in the past that a single camera approach can outperform a dual camera approach in Portrait Mode, particularly when the telephoto camera module has a smaller sensor and slower lens, or lacks reliable autofocus.

Let's take a closer look at some of the Pixel 3's core technologies.

1. Super Res Zoom

Last year the Pixel 2 showed us what was possible with burst photography. HDR+ was its secret sauce, and it worked by constantly buffering nine frames in memory. When you press the shutter, the camera essentially goes back in time to those last nine frames1, breaks each of them up into thousands of 'tiles', aligns them all, and then averages them.

Breaking each image into small tiles allows for advanced alignment even when the photographer or subject introduces movement. Blurred elements in some shots can be discarded, or subjects that have moved from frame to frame can be realigned. Averaging simulates the effects of shooting with a larger sensor by 'evening out' noise. And going back in time to the last 9 frames captured right before you hit the shutter button means there's zero shutter lag.

Like the Pixel 2, HDR+ allows the Pixel 3 to render sharp, low noise images even in high contrast situations. Click image to view the level of detail at 100%. Photo: Google

This year, the Pixel 3 pushes all this further. It uses HDR+ burst photography to buffer up to 15 images2, and then employs super-resolution techniques to increase the resolution of the image beyond what the sensor and lens combination would traditionally achieve3. Subtle shifts from handheld shake and optical image stabilization (OIS) allow scene detail to be localized with sub-pixel precision, since shifts are unlikely to be exact multiples of a pixel.

In fact, I was told the shifts are carefully controlled by the optical image stabilization system. "We can demonstrate the way the optical image stabilization moves very slightly" remarked Marc Levoy. Precise sub-pixel shifts are not necessary at the sensor level though; instead, OIS is used to uniformly distribute a bunch of scene samples across a pixel, and then the images are aligned to sub-pixel precision in software.

We get a red, green, and blue filter behind every pixel just because of the way we shake the lens, so there's no more need to demosaic

But Google – and Peyman Milanfar's research team working on this particular feature – didn't stop there. "We get a red, green, and blue filter behind every pixel just because of the way we shake the lens, so there's no more need to demosaic" explains Marc. If you have enough samples, you can expect any scene element to have fallen on a red, green, and blue pixel. After alignment, then, you have R, G, and B information for any given scene element, which removes the need to demosaic. That itself leads to an increase in resolution (since you don't have to interpolate spatial data from neighboring pixels), and a decrease in noise since the math required for demosaicing is itself a source of noise. The benefits are essentially similar to what you get when shooting pixel shift modes on dedicated cameras.

Normal wide-angle (28mm equiv.) Super Res Zoom

There's a small catch to all this – at least for now. Super Res only activates at 1.2x zoom or more. Not in the default 'zoomed out' 28mm equivalent mode. As expected, the lower your level of zoom, the more impressed you'll be with the resulting Super Res images, and naturally the resolving power of the lens will be a limitation. But the claim is that you can get "digital zoom roughly competitive with a 2x optical zoom" according to Isaac Reynolds, and it all happens right on the phone.

The results I was shown at Google appeared to be more impressive than the example we were provided above, no doubt at least in part due to the extreme zoom of our example here. We'll reserve judgement until we've had a chance to test the feature for ourselves.

Would the Pixel 3 benefit from a second rear camera? For certain scenarios – still landscapes for example – probably. But having more cameras doesn't always mean better capabilities. Quite often 'second' cameras have worse low light performance due to a smaller sensor and slower lens, as well as poor autofocus due to the lack of, or fewer, phase-detect pixels. One huge advantage of Pixel's Portrait Mode is that its autofocus doesn't differ from normal wide-angle shooting: dual pixel AF combined with HDR+ and pixel-binning yields incredible low light performance, even with fast moving erratic subjects.

2. Computational Raw

The Pixel 3 introduces 'computational Raw' capture in the default camera app. Isaac stressed that when Google decided to enable Raw in its Pixel cameras, they wanted to do it right, taking advantage of the phone's computational power.

Our Raw file is the result of aligning and merging multiple frames, which makes it look more like the result of a DSLR

"There's one key difference relative to the rest of the industry. Our DNG is the result of aligning and merging [up to 15] multiple frames... which makes it look more like the result of a DSLR" explains Marc. There's no exaggeration here: we know very well that image quality tends to scale with sensor size thanks to a greater amount of total light collected per exposure, which reduces the impact of the most dominant source of noise in images: photon shot, or statistical, noise.

The Pixel cameras can effectively make up for their small sensor sizes by capturing more total light through multiple exposures, while aligning moving objects from frame to frame so they can still be averaged to decrease noise. That means better low light performance and higher dynamic range than what you'd expect from such a small sensor.

Shooting Raw allows you to take advantage of that extra range: by pulling back blown highlights and raising shadows otherwise clipped to black in the JPEG, and with full freedom over white balance in post thanks to the fact that there's no scaling of the color channels before the Raw file is written. Even better news? HDR+ independently merges red, green and blue channels, which means the Raws are true Raws - un-demosaiced.

Pixel 3 introduces in-camera computational Raw capture.

Such 'merged' Raw files represent a major threat to traditional cameras. The math alone suggests that, solely based on sensor size, 15 averaged frames from the Pixel 3 sensor should compete with APS-C sized sensors in terms of noise levels. There are more factors at play, including fill factor, quantum efficiency and microlens design, but needless to say we're very excited to get the Pixel 3 into our studio scene and compare it with dedicated cameras in Raw mode, where the effects of the JPEG engine can be decoupled from raw performance.

While solutions do exist for combining multiple Raws from traditional cameras with alignment into a single output DNG, having an integrated solution in a smartphone that takes advantage of Google's frankly class-leading tile-based align and merge - with no ghosting artifacts even with moving objects in the frame - is incredibly exciting. This feature should prove highly beneficial to enthusiast photographers. And what's more - Raws are automatically uploaded to Google Photos, so you don't have to worry about transferring them as you do with traditional cameras.

3. Synthetic Fill Flash

'Synthetic Fill Flash' adds a glow to human subjects, as if a reflector were held out in front of them. Photo: Google

Often a photographer will use a reflector to light the faces of backlit subjects. Pixel 3 does this computationally. The same machine-learning based segmentation algorithm that the Pixel camera uses in Portrait Mode is used to identify human subjects and add a warm glow to them.

If you've used the front facing camera on the Pixel 2 for Portrait Mode selfies, you've probably noticed how well it detects and masks human subjects using only segmentation. By using that same segmentation method for synthetic fill flash, the Pixel 3 is able to relight human subjects very effectively, with believable results that don't confuse and relight other objects in the frame.

Interestingly, the same segmentation methods used to identify human subjects are also used for front-facing video image stabilization, which is great news for vloggers. If you're vlogging, you typically want yourself, not the background, to be stabilized. That's impossible with typical gyro-based optical image stabilization. The Pixel 3 analyzes each frame of the video feed and uses digital stabilization to steady you in the frame. There's a small crop penalty to enabling this mode, but it allows for very steady video of the person holding the camera.

4. Learning-based Portrait Mode

The Pixel 2 had one of the best Portrait Modes we've tested despite having only one lens. This was due to its clever use of split pixels to sample a stereo pair of images behind the lens, combined with machine-learning based segmentation to understand human vs. non-human objects in the scene (for an in-depth explanation, watch my video here). Furthermore, dual pixel AF meant robust performance of even moving subjects in low light - great for constantly moving toddlers. The Pixel 3 brings some significant improvements despite lacking a second lens.

According to computational lead Marc Levoy, "Where we used to compute stereo from the dual pixels, we now use a learning-based pipeline. It still utilizes the dual pixels, but it's not a conventional algorithm, it's learning based". Google essentially built a 'frankenphone' rig consisting of 5 Pixel 3 phones that could be fired simultaneously to build high quality depth maps from structure from motion and multi-view stereo. These 'ground truth' maps were used to train a neural network with depth maps generated from the single Pixel 3 phone in the middle of this rig. There were a number of advantages to this approach: the largely separated phones provided large baselines for more accurate depth estimation, less chance of occluded objects going undetected, and parallax in multiple directions allowed Google to avoid the aperture problem (where detail along the axis of stereo disparity essentially has no measured disparity).

What this means is improved results: more uniformly defocused backgrounds and fewer depth map errors. Have a look at the improved results with complex objects, where many approaches are unable to reliably blur backgrounds 'seen through' holes in foreground objects:

Learned result. Background objects, especially those seen through the toy, are consistently blurred. Objects around the peripheries of the image are also more consistently blurred. Learned depth map. Note how objects in the background (blue) aren't confused as being closer to the foreground (yellow) as they are in the heat map below.
Stereo-only result. Background objects, especially those seen through the toy, aren't consistently blurred. Stereo-only based depth map from dual pixels. Note how some elements in the background appear to be closer to the foreground than they really are.

Interestingly, this learning-based approach also yields better results with mid-distance shots where a person is further away. Typically, the further away your subject is, the less difference in stereo disparity between your subject and background, making accurate depth maps difficult to compute given the small 1mm baseline of the split pixels. Take a look at the Portrait Mode comparison below, with the new algorithm on the left vs. the old on the right.

Learned result. The background is uniformly defocused, and the ground shows a smooth, gradual blur. Stereo-only result. Note the sharp railing in the background, and the harsh transition from in-focus to out-of-focus in the ground.

5. Night Sight

Rather than simply rely on long exposures for low light photography, 'Night Sight' utilizes HDR+ burst mode photography to take usable photos in very dark situations. Previously, the Pixel 2 would never drop below 1/15s shutter speed, simply because it needed faster shutter speeds to maintain that 9-frame buffer with zero shutter lag. That does mean that even the Pixel 2 could, in very low light, effectively sample 0.6 seconds (9 x 1/15s), but sometimes that's not even enough to get a usable photo in extremely dark situations.

The camera will merge up to 15 frames... to get you an image equivalent to a 5 second exposure

The Pixel 3 now has a 'Night Sight' mode which sacrifices the zero shutter lag and expects you to hold the camera steady after you've pressed the shutter button. When you do so, the camera will merge up to 15 frames, each with shutter speeds as low as, say, 1/3s, to get you an image equivalent to a 5 second exposure. But without the motion blur that would inevitably result from such a long exposure.

Put simply: even though there might be subject or handheld movement over the entire 5s span of the 15 frame burst, many of the the 1/3s 'snapshots' of that burst are likely to still be sharp, albeit possibly displaced relative to one another. The tile-based alignment of Google's 'robust merge' technology, however, can handle inter-frame movement by aligning objects that have moved and discarding tiles of any frame that have too much motion blur.

Have a look at the results below, which also shows you the benefit of the wider-angle, second front-facing 'groupie' camera:

Normal front-camera 'selfie' Night Sight 'groupie' with wide-angle front-facing lens

Furthermore, Night Sight mode takes a machine-learning based approach to auto white balance. It's often very difficult to determine the dominant light source in such dark environments, so Google has opted to use learning-based AWB to yield natural looking images.

Final thoughts: simpler photography

The philosophy behind the Pixel camera - and for that matter the philosophy behind many smartphone cameras today - is one-button photography. A seamless experience without the need to activate various modes or features.

This is possible thanks to the computational approaches these devices embrace. The Pixel camera and software are designed to give you pleasing results without requiring you to think much about camera settings. Synthetic fill flash activates automatically with backlit human subjects, and Super Resolution automatically kicks in as you zoom.

At their best, these technologies allows you to focus on the moment

Motion photos turns on automatically when the camera detects interesting activity, and Top Shot now uses AI to automatically suggest the best photo of the bunch, even if it's a moment that occurred before you pressed the shutter button. Autofocus typically focuses on human subjects very reliably, but when you need to specify your subject, just tap on it and 'Motion Autofocus' will continue to track and focus on it very reliably. Perfect for your toddler or pet.

At their best, these technologies allow you to focus on the moment, perhaps even enjoy it, and sometimes even help you to capture memories you might have otherwise missed.

We'll be putting the Pixel 3 through its paces soon, so stay tuned. In the meantime, let us know in the comments below what your favorite features are, and what you'd like to see tested.

1In good light, these last 9 frames typically span the last 150ms before you pressed the shutter button. In very low light, it can span up to the last 0.6s.

2We were only told 'say, maybe 15 images' in conversation about the number of images in the buffer for Super Res Zoom and Night Sight. It may be more, it could be less, but we were at least told that it is more than 9 frames. One thing to keep in mind is that even if you have a 15-frame buffer, not all frames are guaranteed to be usable. For example, if in Night Sight one or more of these frames have too much subject motion blur, they're discarded.

3You can achieve a similar super-resolution effect manually with traditional cameras, and we describe the process here.