Digital cameras typically require lenses to focus incoming light on an image sensor. While technology has continually improved, allowing for more compact camera systems, they are nonetheless limited by physics. A lens can only be so small, and the distance between the lens and a sensor so short. This is where 'lensless' cameras come in. Unburdened by the physical limitations of optical design, lensless cameras can be much smaller. Professor Masahiro Yamaguchi of the Tokyo Institute of Technology, a co-author of a research paper about a new approach to lensless camera design, said, 'Without the limitations of a lens, the lensless camera could be ultra-miniature, which could allow new applications that are beyond our imagination.'

The idea for a lensless camera itself isn't new. We've seen it before, including a single-pixel lensless camera in 2013 and, more recently, a much smaller lensless camera in 2017. A lensless camera, which comprises an image sensor and a thin mask in front of the sensor that encodes information from a given scene, requires mathematical reconstruction to produce a detailed image. While a traditional camera with an optical lens uses the glass inside its lens to achieve focus and immediately produce a sharp image, a lensless camera instead encodes light and must then reconstruct a blurry, out-of-focus image into something useful.

As its name suggests, a lensless camera omits a traditional optical lens altogether. Instead, it includes only a sensor and a mask. There's no way for the camera to focus light on the image sensor, so a detailed image must be reconstructed using an encoded pattern and information about how light interacts with the mask and image sensor. Previous approaches have reconstructed an image using an algorithm derived from a physical model. The new method developed by researchers at the Tokyo Institute of Technology instead relies upon a novel deep learning system, resulting in better results that don't rely on an accurate physical approximation.

Credit: Xiuxi Pan / Tokyo Institute of Technology

A group of researchers at Tokyo Tech, including professor Yamaguchi, have created a new reconstruction technique that promises improved image quality and significantly faster processing, two issues that have held back some other lensless cameras.

Earlier lensless cameras, like the one developed by Bell Labs in 2013 and CalTech's camera in 2017, relied upon methods to control light hitting the image sensor and perform sophisticated measurements of how light interacts with the specific, physical mask and image sensor, to then reconstruct an image. Without a way to focus light, a lensless camera captures a blurry image, which must be reconstructed into a sharper image using an algorithm. By understanding how the light interacts with a thin mask in front of the image sensor, an algorithm can decode the light information and reconstruct a focused scene. However, the decoding process is extremely challenging and resource-intensive. Beyond requiring time, generating good image quality requires a perfect physical model. If an algorithm is based on an inaccurate approximation of how light interacts with the mask and sensor, the camera system will falter.

Instead of using a model-based decoding approach, the Tokyo Tech team developed a reconstruction method that relies upon deep learning. Existing deep learning methods using convolutional neural networks (CNN) aren't efficient enough to solve the problem. As outlined by Phys.org, the issue is that a "CNN processes the image based on the relationships of neighboring 'local' pixels, whereas lensless optics transform local information in the scene into overlapping 'global' information on all the pixels of the image sensor, through a property called 'multiplexing."

Here we can see the new lensless camera. It includes an image sensor and a mask that is 2.5mm from the sensor. The mask is built using chromium deposition in a synthetic-silica plate. It has an aperture size of 40×40 μm.

Credit: Xiuxi Pan / Tokyo Institute of Technology

The new research relies upon a novel machine learning algorithm. It's based upon a technique called Vision Transformer (ViT), and it promises improved global reasoning. As Phys writes, "The novelty of the algorithm lies in the structure of the multistage transformer blocks with overlapped 'patchify' modules. This allows it to efficiently learn image features in a hierarchical representation. Consequently, the proposed method can well address the multiplexing property and avoid the limitations of conventional CNN-based deep learning, allowing better image reconstruction."

Vision Transformer (ViT) is leading-edge machine learning technique, which is better at global feature reasoning due to its novel structure of the multistage transformer blocks with overlapped 'patchify' modules. This allows it to efficiently learn image features in a hierarchical representation, making it able to address the multiplexing property and avoid the limitations of conventional CNN-based deep learning, thereby allowing better image reconstruction.

Caption credit: Phys. Image credit: Xiuxi Pan / Tokyo Institute of Technology

The proposed method, using neural networks and a connected transformer, promises improved results. Further, reconstruction errors are reduced, and computing times are shorter. The team believes that the method can be used for real-time capture of high-quality images, something that has eluded previous lensless cameras.

The first row is the ground truth scenes used to test the proposed lensless camera. In this row, the two leftmost columns are targets displayed on an LCD display, while the two rightmost columns are real objects in three-dimensional space. The second row shows the pattern captured by the lensless camera. The third row is the most informative here, as it depicts results using the proposed reconstruction technique. The fourth row shows results using a model-based approach, which has been traditionally used with lensless cameras. The fifth and final row relies upon convolutional neural networks, which as mentioned, have limitations with global image reconstruction.

Image credit: Xiuxi Pan / Tokyo Institute of Technology.

The full research paper, 'Image reconstruction with transformer for mask-based lensless imaging,' is available to paid users at Optica. The paper's authors are Xuixi Pan, Xiao Chen, Saori Takeyama and Masahiro Yamaguchi. You can read the abstract below. The referenced transformer is the ViT:

A mask-based lensless camera optically encodes the scene with a thin mask and reconstructs the image afterward. The improvement of image reconstruction is one of the most important subjects in lensless imaging. Conventional model-based reconstruction approaches, which leverage knowledge of the physical system, are susceptible to imperfect system modeling. Reconstruction with a pure data-driven deep neural network (DNN) avoids this limitation, thereby having potential to provide a better reconstruction quality. However, existing pure DNN reconstruction approaches for lensless imaging do not provide a better result than model-based approaches. We reveal that the multiplexing property in lensless optics makes global features essential in understanding the optically encoded pattern. Additionally, all existing DNN reconstruction approaches apply fully convolutional networks (FCNs) which are not efficient in global feature reasoning. With this analysis, for the first time to the best of our knowledge, a fully connected neural network with a transformer for image reconstruction is proposed. The proposed architecture is better in global feature reasoning, and hence enhances the reconstruction. The superiority of the proposed architecture is verified by comparing with the model-based and FCN-based approaches in an optical experiment.