Researchers with Google Research and the Google Brain deep learning AI team have published a new study detailing Neural Radiance Fields for Unconstrained Photo Collections (NeRF). The system works by taking 'in the wild' unconstrained images of a particular location -- tourist images of a popular attraction, for example -- and using an algorithm to turn them into a dynamic, complex, high-quality 3D model.

The researchers detail their project in a new paper, explaining that their work involves adding 'extensions' to neural radiance fields (NeRF) that enable the AI to accurately reconstruct complex structures from unstructured images, meaning ones taken from random angles with different lighting and backgrounds.

This contrasts to NeRF without the extensions, which is only able to accurately model structures from images that were taken in controlled settings. The obvious benefit to this is that 3D models can be created using the huge number of Internet photos that already exist of these structures, transforming those collections into useful datasets.

Different views of the same model constructed from unstructured images.

The Google researchers call their more sophisticated AI 'NeRF-W,' one used to create 'photorealistic, spatially consistent scene representations' of famous landmarks from images that contain various 'confounding factors.' This represents a huge improvement to the AI, making it far more useful compared to a version that requires carefully controlled image collections to work.

Talking about the underlying technology, the study explains how NeRF works, stating:

'The Neural Radiance Fields (NeRF) approach implicitly models the radiance field and density of a scene within the weights of a neural network. Direct volume rendering is then used to synthesize new views, demonstrating a heretofore unprecedented level of fidelity on a range of challenging scenes.'

There's one big problem, though, which is that NeRF systems only work well if the scene is captured in controlled settings, as mentioned. Without a set of structured images, the AI's ability to generate models 'degrades significantly,' limiting its usefulness compared to other modeling approaches.

The researchers explain how they build upon this AI and advance it with new capabilities, saying in their study:

The central limitation of NeRF that we address in this work is its assumption that the world is geometrically, materially, and photometrically static — that the density and radiance of the world is constant. NeRF therefore requires that any two photographs taken at the same position and orientation must have identical pixel intensities. This assumption is severely violated in many real-world datasets, such as large-scale internet photo collections of well-known tourist landmarks...

To handle these complex scenarios, we present NeRF-W, an extension of NeRF that relaxes the latter’s strict consistency assumptions.

The process involves multiple steps, including first having NeRF-W model the per-image appearance of different elements in the photos, such as the weather, lighting, exposure level and other variables. The AI ultimately learns 'a shared appearance representation for the entire photo collection,' paving the way for the second step.

In the second part, NeRF-W models the overall subject of the images...

' the union of shared and image-dependent elements, thereby enabling the unsupervised decomposition of scene content into static and transient components. This decomposition enables the high-fidelity synthesis of novel views of landmarks without the artifacts otherwise induced by dynamic visual content present in the input imagery.

Our approach models transient elements as a secondary volumetric radiance field combined with a data-dependent uncertainty field, with the latter capturing variable observation noise and further reducing the effect of transient objects on the static scene representation.'

Upon testing their creation, the researchers found that NeRF-W was able to produce high-fidelity models of subjects with multiple detailed viewpoints using 'in-the-wild' unstructured images. Despite using more complicated images with many variables, the NeRF-W models surpassed the quality of models generated by the previous top-tier NeRF systems 'by a large margin across all considered metrics,' according to researchers.

The potential uses for this technology are numerous, including the ability to generate 3D models of popular destinations for VR and AR applications using existing tourist images. This eliminates the need to create carefully-controlled settings for capturing the images, which can be difficult at popular destinations where people and vehicles are often present.

A PDF containing the full study can be found here; some models can be found on the project's GitHub, as well.