Google yesterday announced the Pixel 4 and Pixel 4 XL, updates to the popular line of Pixel smartphones.

We had the opportunity recently to sit down with Marc Levoy, Distinguished Engineer and Computational Photography Lead at Google, and Isaac Reynolds, Product Manager for Camera on Pixel, to dive deep into the imaging improvements brought to the lineup by the Pixel 4.

Table of contents:

Note that we do not yet have access to a production-quality Pixel 4. As such, many of the sample images in this article were provided by Google.

More zoom

The Pixel 4 features a main camera module with a 27mm equivalent F1.7 lens, employing a 12MP 1/2.55" type CMOS sensor. New is a second 'zoomed-in' camera module with a 48mm equivalent, F2.4 lens paired with a slightly smaller 16MP sensor. Both modules are optically stabilized. Google tells us the net result is 1x-3x zoom that is on par with a true 1x-3x optical zoom, and pleasing results all the way out to 4x-6x magnification factors. No doubt the extra resolution of the zoomed-in unit helps with those higher zoom ratios.

Have a look at what the combination of two lenses and super-res zoom gets you with these 1x to 8x full-resolution samples from Google.

Marc emphasized that pinching and zooming to pre-compose your zoomed-in shot is far better than cropping after the fact. I'm speculating here, but I imagine much of this has to do with the ability of super-resolution techniques to generate imagery of higher resolution than any one frame. A 1x super-res zoom image (which you get by shooting 1x Night Sight) still only generates a 12MP image; cropping and upscaling from there is unlikely to get you as good results as feeding crops to the super-res pipeline for it to align and assemble on a higher resolution grid before it outputs a 12MP final image.

We're told that Google is not using the 'field-of-view fusion' technique Huawei uses on its latest phones where, for example, a 3x photo gets its central region from the 5x unit and its peripheries from upscaling (using super-resolution) the 1x capture. But given Google's choice of lenses, its decision makes sense: from our own testing with the Pixel 3, super-res zoom is more than capable of handling zoom factors between 1x and 1.8x, the latter being the magnification factor of Google's zoomed-in lens.

Dual exposure controls with 'Live HDR+'

The results of HDR+, the burst mode multi-frame averaging and tonemapping behind every photograph on Pixel devices, are compelling, retaining details in brights and darks in, usually, a pleasing, believable manner. But it's computationally intensive to show the end result in the 'viewfinder' in real-time as you're composing. This year, Google has opted to use machine learning to approximate HDR+ results in real-time, leading to a much better viewfinder experience.1 Google calls this 'Live HDR+'. It's essentially a WYSIWYG implementation that should give photographers more confidence in the end result, and possibly feel less of a need to adjust the overall exposure manually.

"If we have an intrinsically HDR camera, we should have HDR controls for it" - Marc Levoy

On the other hand, if you do have an approximate live view of the HDR+ result, wouldn't it be nice if you could adjust it in real-time? That's exactly what the new 'dual exposure controls' allow for. Tap on the screen to bring up two separate exposure sliders. The brightness slider, indicated by a white circle with a sun icon, adjusts the overall exposure, and therefore brightness, of the image. The shadows slider essentially adjusts the tonemap, so you can adjust shadow and midtone visibility and detail to suit your taste.

Default HDR+ result Brightness slider (top left) lowered to darken overall exposure
Shadows slider (top center) lowered to create silhouettes Final result

Dual exposure controls are a clever way to operate an 'HDR' camera, as it allows the user to adjust both the overall exposure and the final tonemap in one or two swift steps. Sometimes HDR and tonemapping algorithms can go a bit far (as in this iPhone XS example here), and in such situations photographers will appreciate having some control placed back in their hands.

And while you might think this may be easy to do after-the-fact, we've often found it quite difficult to use the simple editing tools on smartphones to push down the shadows we want darkened after tonemapping has already brightened them. There's a simple reason for that: the 'shadows' or 'blacks' sliders in photo editing tools may or may not target the same range of tones the tonemapping algorithms did when initially processing the photo.

Improved Night Sight

Google's Night Sight is widely regarded as an industry benchmark. We consistently talk about its use not just for low light photography, but for all types of photography because of its use of a super-resolution pipeline to yield higher resolution results with less aliasing and moire artifacts. Night Sight is what allowed the Pixel 3 to catch up to 1"-type and four-thirds image quality, both in terms of detail and noise performance in low light, as you can see here (all cameras shot with equivalent focal plane exposure). So how could Google improve on that?

Well, let's start with the observation that some reviewers of the new iPhone 11 remarked that its night mode had surpassed the Pixel 3's. While that's not entirely true, as I covered in my in-depth look at the respective night modes, we have found that at very low light levels the Pixel 3 does fall behind. And it mostly has to do with the limits: handheld exposures per-frame in our shooting with the Pixel 3 were limited to ~1/3s to minimize blur caused by handshake. Meanwhile, the tripod-based mode only allowed shutter speeds up to 1s. Handheld and tripod-based shots were limited to 15 and 6 total frames, respectively, to avoid user fatigue. That meant the longest exposures you could ever take were limited to 5-6s.

Pixel 4 extends the per-frame exposure, when no motion is detected, to at least 16 seconds and up to 15 frames. That's a total of 4 minutes of exposure. Which is what allows the Pixel 4 to capture the Milky Way:

Remarkable is the lack of user input: just set the phone up against a rock to stabilize it, and press one button. That's it. It's important to note you couldn't get this result with one long exposure, either with the Pixel phone or a dedicated camera, because it would result in star trails. So how does the Pixel 4 get around this limitation?

The same technique that enables high quality imagery from a small sensor: burst photography. First, the camera picks a shutter speed short enough to ensure no star trails. Next, it takes many frames at this shutter speed and aligns them. Since alignment is tile-based, it can handle the moving stars due to the rotation of the sky just as the standard HDR+ algorithm handles motion in scenes. Normally, such alignment is very tricky for photographers shooting night skies with non-celestial, static objects in the frame, since aligning the stars would cause misalignment in the foreground static objects, and vice versa.

Improved Night Sight will not only benefit starry skyscapes, but all types of photography requiring long exposures

But Google's robust tile-based merge can handle displacement of objects from frame to frame of up to ~8% in the frame2. Think of it as tile-based alignment where each frame is broken up into roughly 12,000 tiles, with each tile individually aligned to the base frame. That's why the Pixel 4 has no trouble treating stars in the sky differently from static foreground objects.

Another issue with such long total exposures is hot pixels. These pixels can become 'stuck' at high luminance values as exposure times increase. The new Night Sight uses clever algorithms to emulate hot pixel suppression, to ensure you don't have bright pixels scattered throughout your dark sky shot.

DSLR-like bokeh

This is potentially a big deal, and perhaps underplayed, but the Google Pixel 4 will render bokeh, particularly out-of-focus highlights, closer to what we'd expect from traditional cameras and optics. Until now, while Pixel phones did render proper disc-shaped blur for out of focus areas as real lenses do (as opposed to a simple Gaussian blur), blurred backgrounds simply didn't have the impact they tend to have with traditional cameras, where out-of-focus highlights pop out of the image in gorgeous, bright, disc-shaped circles as they do in these comparative iPhone 11 examples here and also here.

The new bokeh rendition on the Pixel 4 takes things a step closer to traditional optics, while avoiding the 'cheap' technique some of its competitors use where bright circular discs are simply 'stamped' in to the image (compare the inconsistently 'stamped' bokeh balls in this Samsung S10+ image here next to the un-stamped, more accurate Pixel 3 image here). Have a look below at the improvements over the Pixel 3; internal comparisons graciously provided to me via Google.

Daytime bokeh

Daytime bokeh

Nighttime bokeh

Nighttime bokeh

The impactful, bright, disc-shaped bokeh of out-of-focus highlights are due to the processing of the blur at a Raw level, where linearity ensures that Google's algorithms know just how bright those out-of-focus highlights are relative to their surroundings.

Previously, applying the blur to 8-bit tonemapped images resulted in less pronounced out-of-focus highlights, since HDR tonemapping usually compresses the difference in luminosity between these bright highlights and other tones in the scene. That meant that out-of-focus 'bokeh balls' weren't as bright or separated from the rest of the scene as they would be with traditional cameras. But Google's new approach of applying the blur at the Raw stage allows it to more realistically approximate what happens optically with conventional optics.

One thing I wonder about: if the blur is applied at the Raw stage, will we get Raw portrait mode images in a software update down-the-line?

Portrait mode improvements

Portrait mode has been improved in other ways apart from simply better bokeh, as outlined above. But before we begin I want to clarify something up front: the term 'fake bokeh' as our readers and many reviewers like to call blur modes on recent phones is not accurate. The best computational imaging devices, from smartphones to Lytro cameras (remember them?), can actually simulate blur true to what you'd expect from traditional optical devices. Just look at the gradual blur in this Pixel 2 shot here. The Pixel phones (and iPhones as well as other phones) generate actual depth maps, gradually blurring objects from near to far. This isn't a simple case of 'if area detected as background, add blurriness'.

The Google Pixel 3 generated a depth map from its split photodiodes with a ~1mm stereo disparity, and augmented it using machine learning. Google trained a neural network using depth maps generated by its dual pixel array (stereo disparity only) as input, and 'ground truth' results generated by a 'franken-rig' that used 5 Pixel cameras to create more accurate depth maps than simple split pixels, or even two cameras, could. That allowed Google's Portrait mode to understand depth cues from things like defocus cues (out-of-focus objects are probably further away than in-focus ones) and semantic cues (smaller objects are probably further away than larger ones).

Deriving stereo disparity from two perpendicular baselines affords the Pixel 4 much more accurate depth maps

The Pixel 4's additional zoomed-in lens now gives Google more stereo data to work with, and Google has been clever in its arrangement: if you're holding the phone upright, the two lenses give you horizontal (left-right) stereo disparity, while the split pixels on the main camera sensor give you vertical (up-down) stereo disparity. Having stereo data along two perpendicular axes avoids artifacts related to the 'aperture problem', where detail along the axis of stereo disparity essentially has no measured disparity.

Try this: hold a pen up in front of you horizontally, close to your eyes, and blink to switch between your left and right eye. The pen doesn't look very different as you switch eyes, does it? Now rotate it so it's pointing up in front of you, again close to the center of your face, and do the same experiment. You'll see the now vertically-oriented pen moving dramatically left and right as you switch eyes.

Deriving stereo disparity from two perpendicular baselines affords the Pixel 4 much more accurate depth maps, with the dual cameras providing disparity information that the split pixels might miss, and vice versa. In the example below, provided by Google, the Pixel 4 result is far more believable than the Pixel 3 result, which has parts of the upper and lower green stem, and the horizontally-oriented green leaf near bottom right, accidentally blurred despite falling within the plane of focus.

(dual baseline)

(single baseline)

The combination of two baselines, one short (split pixels) and one significantly longer (the two lenses) also has other benefits. The longer stereo baselines of dual camera setups can run into the problem of occlusion: since the two perspectives are considerably different, one lens may see a background object that to the other lens is hidden behind a foreground object. The shorter 1mm disparity of the dual pixel sensor means its less prone to errors due to occlusion.

On the other hand, the short disparity of the split pixels means that further away objects that are not quite at infinity appear the same to 'left-looking' and 'right-looking' (or up/down) photodiodes. The longer baseline of the dual cameras means that stereo disparity can be calculated for these further away objects, which allows the Pixel 4's portrait mode to better deal with distant subjects, or groups of people shot from further back, as you can see below.

There's yet another benefit of the two separate methods for calculating stereo disparity: macro photography. If you've shot portrait mode on telephoto units of other smartphones, you've probably run into error messages like 'Move farther away'. That's because these telephoto lenses tend to have a minimum focus distance of ~20cm. Meanwhile, the minimum focus distance of the main camera on the Pixel 4 is only 10cm. That means that for close-up photography, the Pixel 4 can simply use its split pixels and learning-based approach to blur backgrounds.3

We confirmed that the additional burden of taking two images with the dual camera setup does not cause any additional latency. The iPhone 11, for example, has considerable shutter lag in portrait mode.

Google continues to keep a range of planes in perfect focus, which can sometimes lead to odd results where multiple people in a scene remain focused despite being at different depths. However, this approach avoids prematurely blurring parts of people that shouldn't be blurred, a common problem with iPhones.

Oddly, portrait mode is unavailable with the zoomed-in lens, instead opting to use the same 1.5x crop from the main camera that the Pixel 3 used. This means images will have less detail compared to some competitors, especially since the super-res zoom pipeline is still not used in portrait mode. It also means you don't get the versatility of both wide-angle and telephoto portrait shots. And if there's one thing you probably know about me, it's that I love my wide angle portraits!

Pixel 4's portrait mode continues to use a 1.5x crop from the main camera. This means that, like the Pixel 3, it will have considerably less detail than portrait modes from competitors like the iPhone 11 Pro that use the full-resolution image from wide or tele modules. Click to view at 100%

Further improvements

There are a few more updates to note.

Learning-based AWB

The learning-based white balance that debuted in Night Sight is now the default auto white balance (AWB) algorithm in all camera modes on the Pixel 4. What is learning-based white balance? Google trained its traditional AWB algorithm to discriminate between poorly, and properly, white balanced images. The company did this by hand-correcting images captured using the traditional AWB algorithm, and then using these corrected images to train the algorithm to suggest appropriate color shifts to achieve a more neutral output.

Google tells us that the latest iteration of the algorithm is improved in a number of ways. A larger training data set has been used to yield better results in low light and adversarial lighting conditions. The new AWB algorithm is better at recognizing specific, common illuminants and adjusting for them, and also yields better results under artificial lights of one dominant color. We've been impressed with white balance results in Night Sight on the Pixel 3, and are glad to see it ported over to all camera modes. See below how Google's learning-based AWB (top left) preserves both blue and red/orange tones in the sky compared to its traditional AWB (top right), and how much better it is at separating complex sunset colors (bottom left) compared to the iPhone XS (bottom right).

Learning-based AWB (Pixel 3 Night Sight) Traditional AWB (Pixel 3)
Learning-based AWB (Pixel 3 Night Sight) iPhone XS HDR result

New face detector

A new face detection algorithm based solely on machine learning is now used to detect, focus, and expose for faces in the scene. The new face detector is more robust at identifying faces in challenging lighting conditions. This should help the Pixel 4 better focus on and expose for, for example, strongly backlit faces. The Pixel 3 would often prioritize exposure for highlights and underexpose faces in backlit conditions.

Though tonemapping would brighten the face properly in post-processing, the shorter exposure would mean more noise in shadows and midtones, which after noise reduction could lead to smeared, blurry results. In the example below the Pixel 3 used an exposure time of 1/300s while the iPhone 11 yielded more detailed results due to its use of an exposure more appropriate for the subject (1/60s).

Along with the new face detector, the Pixel 4 will (finally) indicate the face it's focusing on in the 'viewfinder' as you compose. In the past, Pixel phones would simply show a circle in the center of the screen every time it refocused, which was a very confusing experience that left users wondering whether the camera was in fact focusing on a face in the scene, or simply on the center. Indicating the face its focusing on should allow Pixel 4 users to worry less, and feel less of a need to tap on a face in the scene if the camera's already indicating it's focusing on it.

On previous Pixel phones, a circle focus indicator would pop up in the center when the camera refocused, leading to confusion. Is the camera focusing on the face, or the outstretched hand? On the Huawei P20, the camera indicates when it's tracking a face. The Pixel 4 will have a similar visual indicator.

Semantic segmentation

This isn't new, but in his keynote Marc mentioned 'semantic segmentation' which, like the iPhone, allows image processing to treat different portions of the scene differently. It's been around for years in fact, allowing Pixel phones to brighten faces ('synthetic fill flash'), or to better separate foregrounds and backgrounds in Portrait mode shots. I'd personally point out that Google takes a more conservative approach in its implementation: faces aren't brightened or treated differently as much as they tend to be with the iPhone 11. The end result is a matter of personal taste.

Conclusion

The questions on the minds of many of our readers will undoubtedly be: (1) what is the best smartphone for photography I can buy, and (2) when should I consider using such a device as opposed to my dedicated camera?

We have much testing to do and many side-by-sides to come. But from our tests thus far and our recent iPhone 11 vs. Pixel 3 Night Sight article, one thing is clear: in most situations the Pixel cameras are capable of a level of image quality unsurpassed by any other smartphone when you compare images at the pixel (no pun intended) level.

But other devices are catching up, or exceeding Pixel phone capabilities. Huawei's field-of-view fusion offers compelling image quality across multiple zoom ratios thanks to its fusion of image data from multiple lenses. iPhones offer a wide-angle portrait mode far more suited for the types of photography casual users engage in, with better image quality to boot than Pixel's (cropped) Portrait mode.

The Pixel 4 takes an already great camera and refines it to achieve results closer to, and in some cases surpassing, traditional cameras and optics

Overall though, Google Pixel phones deliver some of the best image quality we've seen from a mobile device. No other phone can compete with its Raw results, since Raws are a result of a burst of images stacked using Google's robust align-and-merge algorithm. Night Sight is now improved to allow for superior results with static scenes demanding long exposures. And Portrait mode is vastly improved thanks to dual baselines and machine learning, with fewer depth map errors and better ability to 'cut around' complex objects like pet fur or loose hair strands. And pleasing out-of-focus highlights thanks to 'DSLR-like bokeh'. AWB is improved, and a new learning-based face detector should improve focus and exposure of faces under challenging lighting.

It's not going to replace your dedicated camera in all situations, but in many it might. The Pixel 4 takes an already great camera in the Pixel 3, and refines it further to achieve results closer to, and in some cases surpassing, traditional cameras and optics. Stay tuned for more thorough tests once we get a unit in our hands.

Finally, have a watch of Marc Levoy's Keynote presentation yesterday below. And if you haven't already, watch his lectures on digital photography or visit his course website from the digital photography class he taught while at Stanford. There's a wealth of information on digital imaging in those talks, and Marc has a knack for distilling complex topics into elegantly simple terms.


Footnotes:

1 The Pixel 3's dim display combined with the dark shadows of a non-HDR preview often made the experience of shooting high contrast scenes outdoors lackluster, sometimes even making it difficult to compose. Live HDR+ should dramatically improve the experience, though the display remains relatively dim compared to the iPhone 11 Pro.

2 The original paper on HDR+ by Hasinoff and Levoy claims HDR+ can handle displacements of up to 169 pixels within a single raw color channel image. For a 12MP 4:3 Bayer sensor, that's 169 pixels of a 2000 pixel wide (3MP) image, which amounts to ~8.5%. Furthermore, tile-based alignment is performed using as small as 16x16 pixel blocks of that single raw channel image. That amounts to ~12,000 effective tiles that can be individually aligned.

3 The iPhone 11's wide angle portrait mode also allows you to get closer to subjects, since its ultra-wide and wide cameras can focus on nearer subjects than its telephoto lens.