Autofocus System Design

Let's take a close look at a photo I posted previously:




On the upper right pane, note slight eclipsing of lower separator-mask opening by lens diaphragm.

When setting up for this photo, it was extremely difficult to achieve precise alignment of the AF module, to the center of the main lens exit pupil. Here we can just start to see the effect of the residual misalignment, presenting as mild vignetting of the upper images at the AF sensor. For reasons I will discuss in more detail later, one does not want any discrepancies in the brightness of the two images in each pair, so this kind of off-center vignetting needs to be avoided as much as possible.

The angles that the separator-mask images make with the optical axis range from 2.0 deg. to 3.6 deg. (that range covers the radial width of the openings). In order for the images to remain well-centered in the lens aperture and avoid vignetting when the main lens is at - or even a little under - the minimum design aperture, the angular alignment of the AF module must be kept within a very small fraction of one degree.

To accomplish this (and also allow for fine-adjust of the AF module position along the optical axis), the module is suspended from its top frame by three fine-thread alignment screws which are spring-loaded:




AF module alignment provisions

The fine thread of the alignment screws provides movement of less than one micron, per degree of rotation. These adjustments are performed at the factory, and are interactive with the adjustment for the AF sub-mirror in the mirror box.

Unfortunately, many authors on the web have suggested use of the AF sub-mirror rest-stop adjustment as a means of global AF-error compensation. Changing the position of this stop throws the alignment of the viewfinder AF points out, and can result in loss of AF performance when the main lens is close to the AF-system minimum aperture (f/8 for the D300):




This is not a global AF adjustment, and should never be used as such.

That small adjuster at the back of the mirror-box, just above the base, can only be set up correctly by running firmware on the camera that allows the AF-sensor images to be checked. If it is disturbed, there is no means for an owner to ensure that it is accurately returned to its original position.



--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
For those who have read through most of the posts here, describing the AF optics, I would like to pose a question for your consideration.

What would happen to the AF system behavior and characteristics, if we were to remove the field lenses?

In particular:

How would the brightness of the AF-sensor images change with main-lens aperture, compared to the system with field lenses?

The D300 AF module sets a baseline for triangulating, which is just inside the f/8 circle of the main lens. Without the field lenses, what would this baseline be?

How would the depth of focus for the AF-sensor images be affected?

How much shift would there be in the AF-sensor images, with lens focus - or would this even be measurable?

Later this week, I will post photos showing the effects of removing the field lenses. In the meantime, I invite you to see if you can answer the above questions for yourself.
 
Marianne:

Outstanding - thank you.

I was wondering if you, or anyone else, could answer a question I have that results from comments made elsewhere that a lens designed for a Phase Detect AF system that is not on the sensor (traditional DSLR) would not work properly on a camera that had a Phase Detect AF system on the sensor (e.g. mirrorless).

I'm confused because I thought the 'brains' of the AF system were in the camera and the camera simply told the lens' motor to move the focus in our out - are there any AF brains in the lens or just a motor?

Would a motorized AF lens work on any camera (as long as the mount was compatible), regardless of where the AF system was located or what type of AF system it was (PD or CD)?

The comment was related to the feasibility of Nikon simply taking out the mirror box and having an AF system similar to the Sony a6000 - where it had on sensor phase detect combined with contrast detect. Some commented that existing lenses would not be compatible with the OSPDAF nor CD - but I don't understand why b/c I thought the lens just moved in and out as directed by the camera and the camera was determining whether the image was in focus or not.

Thank you, eric
 
Last edited:
Marianne:

Outstanding - thank you.

I was wondering if you, or anyone else, could answer a question I have that results from comments made elsewhere that a lens designed for a Phase Detect AF system that is not on the sensor (traditional DSLR) would not work properly on a camera that had a Phase Detect AF system on the sensor (e.g. mirrorless).
When people make comments like that, you need to demand a detailed explanation. They may be merely "parroting" what they have heard/seen elsewhere, or drawing inappropriate conclusions from information that is actually unrelated.
I'm confused because I thought the 'brains' of the AF system were in the camera and the camera simply told the lens' motor to move the focus in our out - are there any AF brains in the lens or just a motor?
You are correct that the bulk of the AF processing must be done within the camera body. The serial data link between body and lens cannot pass the large amount of data that is read from the AF sensor, so the camera body must at least derive the focus-error parameter.

After that point, the body could simply pass the focus-error value to the lens, updating it as the lens moves. The lens processor could handle all of the dynamics unique to its own AF drive and focus group, maintain its own inner servo loop, and even apply focus-error compensation. But this is only one possible "split" between camera and lens processors; the camera processor could be further involved with the lens control.
Would a motorized AF lens work on any camera (as long as the mount was compatible), regardless of where the AF system was located or what type of AF system it was (PD or CD)?

The comment was related to the feasibility of Nikon simply taking out the mirror box and having an AF system similar to the Sony a6000 - where it had on sensor phase detect combined with contrast detect. Some commented that existing lenses would not be compatible with the OSPDAF nor CD - but I don't understand why b/c I thought the lens just moved in and out as directed by the camera and the camera was determining whether the image was in focus or not.
In principle, you can make existing Nikon lenses work with on-sensor PDAF. The question is whether a lens designed to work with conventional PDAF would be optimal for on-sensor PDAF, or whether some design changes would be needed to achieve best performance. Autofocus is a highly competitive area, and manufacturers would not want to field a system that under-performs.

Additionally, on-sensor PDAF systems may impose certain requirements on lens optical design, such as degree of telecentricity (apparent distance of lens exit pupil) and freedom from eclipsing of the exit pupil, so some existing Nikon lenses may be optically unsuitable.
 
As discussed in prior posts, the AF sensor has 11 pairs of vertical detection lines which serve all 51 AF points, and 5 pairs of horizontal detection lines for the central group of 15 cross-type sensors.

Each vertical detection line is 2.08mm long, but the image projected onto it is only 1.88mm high, leaving an alignment margin of 0.2mm total. Similarly, each horizontal detection line is 1.36mm long, but the image projected onto it is 1.16mm wide, again leaving 0.2mm of alignment margin. Both types of lines are 0.12mm wide.

Vertical lines are divided into 5 regions, for the 5 AF-point rows which use them. The spacing between these regions, i.e. their height, is precisely defined by the spacing between the horizontal detection lines (at least for the center group of 15 AF points), which is 0.36mm. The Horizontal lines are divided into 3 regions since they serve three columns of AF points in the center group. The spacing or width of these regions (defined by the spacing between vertical lines) is 0.38mm.

It is also worth mentioning that the spacing between the images projected onto the AF sensor by the separator lenses is slightly wider than the spacing between opposite groups of detection lines. This gives the images an outward shift of about 0.05mm on each side, rather than being precisely centered on the detection lines. I believe this is likely by design, rather than merely being a manufacturing tolerance; more about this later.

When the camera lens is in focus (and when using AF-S single-point AF), the horizontal and vertical spans where image detail is recognized for each AF point (i.e., where it is simultaneously visible on both left and right horizontal lines, or on both top and bottom vertical lines), is about 0.24mm wide or high. We now have a frame and dimensions for the individual AF-point regions, that we can use to discuss processing of the data taken from the detection lines. Here are the regions for the horizontal detection lines:


Each of the 10 horizontal detection lines is divided into 3 AF-point regions. Also note image offset.

[Note: Relative positions of lines shown in this diagram is only for reasons of compactness, and does not reflect their physical layout on the sensor, where they are in fact co-linear and well separated.]

Establishing a Model

Not all details of the sensel layout on the detection lines are known at this point. It appears that they have a 6um pitch, but there is an unknown number of sensels across the 0.12mm width of the lines. It is also not known how the columns of sensels are staggered, and what spatial resolution results.

In order to continue the discussions, I have decided to use a simplified model of the detection-line sensel layout. The actual AF module will probably have better performance (precision and accuracy) than our model, so keep this in mind for the following discussions.

The model has a 6um sensel pitch, but each sensel is assumed to cover the full width of the line, so its dimensions will be 6um by 120um. Rectangular sensels such as this are likely used in a number of AF sensor designs. The data read from the detection lines is thus strictly one-dimensional; any image detail variations across the width of the line will be averaged out.

Each AF-point region on a horizontal detection line will include about 63 sensels, and on the vertical detection lines will include 60 sensels. The 0.24mm span within each AF point, containing image detail recognizable when the camera lens is at an in-focus position, will include 40 sensels; this is an important number and establishes the size of the data set used in calculating image correlations for focus-error determination.

The final assumption for our model, is that the range of data to be used when determining image shift from defocus, will be limited to the sensels within the selected AF point, plus only a few outside of that region. It is likely that the actual camera will go beyond this range in certain cases, although of course it will always be limited by the boundaries of the images projected onto the AF sensor.

Evaluating Image Shift

As has been shown in previous posts, the images projected onto the detection lines will move away from each other if the lens focus is moved toward infinity - or toward each other if the lens focus is moved closer. When the camera lens is in focus (barring any calibration modification such as AF fine-tune), the AF-point region on the left detection line will see exactly the same image details as the corresponding AF-point region on the right detection line does.


Alignment of image details is shifted when lens is out of focus (note image boundaries do not move).

It is a very simple matter for us, with our visual cortex optimized for image recognition, to immediately determine the amount of image shift - which gives the direction and amount of the focus error.

The AF processor, however, must execute many steps to determine this, scanning the full range available within the AF point and checking for a match between the left and right image samples.

In our model, each step will require 40 value comparisons (one for each sensel in the 0.24mm span). To investigate the full width of the AF-point region (plus a bit), we will shift the test span in the left line from 13 sensels to the left of centered, to 13 sensels to the right of centered (the test span in the right line is moved in the opposite direction). For best resolution, we can shift the left and right test spans one at a time, giving a total of 53 steps to evaluate. For each step, we record a value which indicates how well the image samples within our test spans match.

Continued in next post . . .

--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Last edited:
With the new exhibition hung, at last I have time for more comment on the Nikon 51 point AF.

The D300 has a first permanent mask behind the mirror that can be seen when carefully raising the mirror. The area not masked allows some light to pass through the mirror via narrow slits. The D300 brochure indicates the mirror slits cover about 18mm by 8.5 mm compared to the 24x16 sensor area.

Marianne’s measurement of the condenser in the base of the camera body (excluding the 2 dividers) is a very similar size at 17.9 x 8.4mm.

I have not been able to magnify my D7100 mirror surface more than 8x to see the slits. I assume the slits run vertically as the majority of detection lines in the AF module run vertically.

No light passes through the solid spaces between the slits. This is likely to reduce the light transmitted to the AF detectors by at least 6 stops.

Ignoring the effect of the second mask that Marianne has identified behind the condenser, the AF sensors work with a lot less light than the camera image sensor. This lower level of light used for AF is partly offset by no R, G and B filters, which filter out some light at the image sensor, being used by the AF sensors.

The image at the secondary mirror behind the main mirror slits should be moderately out of focus, being about 15% less than the lens flange to sensor distance at the secondary mirror centre. I have not been able to find Nikon drawings showing a precise position for recent cameras, though the principles are unchanged from the F6 for which Nikon provided illustrations.

The sub mirror seems to be located on average about 8mm before the taking lens focus distance at the image sensor and in consequence outside the depth of focus at the image sensor when the mirrors are up.

The light reaching the AF detection lines travels about 30 mm further (part based on Nikons F6 drawing and part on Marianne’s measurement) passing through the taking lens depth of focus zone to about 22 mm beyond. The image detail should be well out of focus in the vicinity of the AF masks behind the sensor.

Marianne has taken part of the AF module apart and concluded that the condenser changes the angle of light passing through it differentially, but only for outer AF points, so that light is more accurately projected onto the outer AF detection lines. Nikon refer to this in the patent detail.

The condenser mask is the same size as the mirror mask. The effect of the condenser on depth of focus should be no different to adding a clear filter behind the taking lens. Nikon incorporate a rear filter in some long telephoto lenses.

The 8 lenses behind the 8 secondary masks seem to need to be a specific focal length at a specific focus distance to bring the out of focus detail of the taking lens within the depth of focus at each detector line.

It seems reasonable to assume the AF detectors do not recognize black areas where the solid space between mirror slits obscures the image; and recognise only detect light coming through the slits.

As already mentioned the light transmitted to the AF sensors is several stops less than the lens effective aperture at infinity focus. An f2 lens is unlikely to transmit more than f11 light and an f 5.6 lens is unlikely to transmit more than f32 light compared to the scene brightness in front of the lens. This should part negate the limiting effect of a second separator mask, if it is f5.6, in many medium and low EV lighting situations.

The depth of focus, which is what is detected at the AF sensors, is extremely narrow with fast lenses and long lenses. Because of this with some types of subject detail fast and long lenses in particular record little contrast within their depth of focus.

The AF sensors are primarily contrast comparison devices. Although a second internal mask may restrict depth of focus to no wider than f5.6 at the detection lines it cannot add contrast not recorded by a taking lens with a faster than f5.6 effective infinity aperture.

This would help explain why, with some types of subject detail an f5.6 lens with wider depth of focus may focus where a faster lens of the same focal length with narrow depth of focus can fail.

The effect should in principle be no different to viewfinder brightness that, despite being limited to no wider than f2.8 on a D300, shows a brighter viewfinder image with an f2.8 lens than with an f5.6 lens of the same focal length.

In times gone by Pop Photo recorded the dramatic loss of AF speed in low light condition with SLR’s AF technology toward the end of last century. Although the speed differences are much less with recent AF systems, AF has always performed faster in brighter light.

This perhaps helps explain the reality of why most wide aperture lenses focus faster than f5.6 lenses most of the time, even if there is a limiting internal f5.6 aperture mask similar to the f2.8 one in a DX viewfinder.

Slightly off topic this would also help explain why f1.4 wide angles with very narrow depth of focus combined with small reproduction ratios are often reported as more prone to AF failure than other types of lenses. My “different aperture AF failures” are mainly around 200 mm and 300 mm focal lengths, where there is narrow depth of focus because of the long focal length.

Marianne speculates that AF might work better with f8 D300 DX lens combinations than with the D3 FX. As a former owner of both bodies I found no difference with either format, other than AF needed a higher quality target with an f8 combination than with an F5.6 combination.

One of the reasons I chose Nikon over Canon in 1999 was wildlife photographers like Moose Peterson were saying the F100 (full frame) AF worked reasonably well with f8 combinations. When I bought F100 bodies and the 500 f4 plus 2x, I found this to be broadly correct. Nikon AF worked reasonably with f8 combinations on FX bodies several years before Nikon digital FX went on sale.

Summing up Marianne and I may both be right. The AF system may have a second f5.6 mask but, for the reasons outlined, this does not necessarily prevent AF often working faster in low light with faster aperture lenses, and sometimes fast aperture lenses failing to auto focus on a subject where slower apertures lenses succeed.

Thanks are due by everyone reading this thread to Marianne for dissecting a D300 to help us better understand how 51 point phase detect AF works.

--
Leonard Shepherd
Producing good quality photographs, or being good at sport or art, involves a little more than buying appropriate equipment. Practice, some learning and perhaps natural talent often play a bigger role than the equipment in your hands.
 
The processing of data from the AF sensor starts with reading out the values from the detection lines. Here, I am limiting the discussion to a single AF point, which will be one of the central cross-type points equipped with both horizontal and vertical detection lines. We will work with the horizontal detection lines first.

Using our model as discussed previously, the detection-line sensels act to average out the detail across the 0.12mm width of the detection line. That is, the 2D image data is reduced to just one dimension.

As an example, I used some fairly small text which is only tall enough to span about half of the horizontal detection-line width; about 8 characters of the text fit into the AF-point box in the viewfinder. Comparing to the "quick brown fox" text in the previous post (which is really too fine for good AF), it would be about 2-3 times larger.

To simulate the function of the detection line, I photograph the text, then extract the average row data from the RAW file, using my image-analysis utility. The window for this extraction is 20 sensels high (corresponding to the 0.12mm detection-line width) and is 66 sensels long. This length includes enough sensels for the 40-sensel test span, plus another 13 sensels at each end to allow for that much shift. The 66 sensels take up about 0.40mm along the detection line (slightly more than the 0.38mm allotted to each AF point).

Due to the small size of the text, plus the fact that it only covers about half of the detection-line width, the contrast in the data from the detection lines is not very high. Here are plots of the 66 values from each line (left line in blue, right line in red):


The 66 values read from each horizontal detection line, for a fine-text subject.

At first glance, this tends to look like random noise, especially since the data come from two separate images which do not have the same sensel alignment to the image (causes some discrepancy in the fine shapes). If one takes a little time and looks closely, some matching features can be identified. (Hint: Shift the blue line to the right, and red line to the left, 8 positions.) This data will definitely pose a challenge for the AF processing to identify the shift.

Let's say that the values for the left line have been loaded into a 66-element array A[] and the values for the right line have been loaded into another 66-element array B[] residing in the processor's memory. We refer to the individual values as A[0] to A[65] and B[0] to B[65].

Performing the Correlation

Thanks to details provided earlier by Bernard Delley from a Nikon patent, we can apply the same correlation approach specified by Nikon. The test span used by our model is 40 sensels wide, so we will take 40 contiguous data values at a time from the left line, and compare them to 40 contiguous data values from the right line.

The criterion used for comparison is simply the absolute value of the difference between sensel values. For each step in the process, we calculate the 40 absolute differences, then add them together; this sum is the correlation value for each step. When all steps are complete, we can plot the correlation values as a function of the test-span shifts that we used.

First step looks at the first 40 values in the A line and compares them to the last 40 values in the B line; that is, we are taking A-line values starting with a 13-sensel left shift from center, and taking B-line values starting with a 13-sensel right shift from center. The first correlation value is thus

C(-13) = Abs(A[0] - B[26]) + Abs(A[1] - B[27]) + . . . + Abs(A[39] - B[65])

The next one will be

C(-12) = Abs(A[1] - B[25]) + Abs(A[2] - B[26]) + . . . + Abs(A[40] - B[64])

Note that as the A[] indices go up, the B[] indices go down; our test spans are moving in opposite directions (toward each other, to start). When we have completed half the steps, the test spans will both be in the center; after that they will move apart again. The last step will be:

C(13) = Abs(A[26] - B[0]) + Abs(A[27] - B[1]) + . . . + Abs(A[65] - B[39])

We can also "squeeze in" an intermediate step between each of the above 27 steps, if we only change one of the indices (instead of both) at a time. This improves spatial resolution, and gives us a total of 53 correlation values to use. I call these intermediate values C(-12.5), C(-11.5), etc.

The C() values that we compute will be large if the image samples in the test spans do not match, and will be small if the image samples in the test spans have a good match. When we plot the C() values, we are looking for the place on the curve that is lowest.

I created a spreadsheet which does all of the above correlation calculations, from the line data extracted by the image-analysis utility. Without further ado, here is the correlation curve for the line data shown in the plot above:


The minimum value on the curve is not dramatically lower, but is still readily identified.

We see that the best match is at C(-8). This means that the 40-value window of Left line data, taken 8 sensels left of centered, matches the 40-value window of Right line data, taken 8 sensels to the right of centered. We conclude that the camera lens is out of focus, such that each image is 8 sensels = 48um away from its in-focus reference position. The autofocus system will respond by moving the lens focus closer until the images match with no shift. If we repeated the correlation-curve plot afterward, we would see the minimum value in the curve lands at 0 shift.

This is actually a difficult example, and the correlation curve indication is rather weak. We can also have a look at the vertical-line data and correlation, which will be much clearer.

For the vertical-line example, we still have a horizontal line of text running through the AF point as before, but there is also a horizontal line a little distance below it. This gives the vertical detection lines two strong features to detect. For this case, I have reversed the shifts (simulating front-focusing of the camera lens). Here are the plots of the values read from the vertical detection lines:


Two strong horizontal features within the AF point produce very clear responses from vertical detection lines.

The wide troughs correspond to where the line of text is, and the narrow ones are from the horizontal line in the image.

Not surprisingly, the correlation plot gives us a much more definite indication:


Match occurs at C(3).

This is what we like to see for a subject that allows accurate AF. The only feature that threatens to make the conclusion less clear, is the falloff at C(-13) and C(-12). This is due to the weak match found, between the text and the horizontal line in the image.

In the following posts, we will take a look at some cases that are potentially problematic, and also look at how well the AF system handles blur (such as diffraction blur) and soft subjects.

--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Last edited:
Summing up Marianne and I may both be right. The AF system may have a second f5.6 mask
There is no f/5.6 mask.

There are

1) The field lens mask, which selects the portions of the image that each group of AF points uses, and

2) The separator-lens mask (visible through the front of the 200 f/2 lens in photos below), which selects the portions of the main lens exit pupil, that are used by the AF system. No areas of the exit pupil, other than the four seen here, are able to pass light to the AF sensor.

The small size of the openings in the separator-lens mask produce an effective f/28 aperture for the AF system, with regard to light transmission efficiency and depth of focus at the AF sensor. The set of mask openings is arranged to fit within the f/7.5 circle as seen in the upper right pane here:





This demonstrates that AF-sensor image brightness is invariant across lens aperture settings from about f/2 to f/7.5. Photos of the AF-sensor images were taken using manual exposure with no change to exposure settings or ISO between photos.



--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
I am no good at optics, but sure love your signature!!! :-D
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
God bless!!!
 
Following some experiments that I've been performing this week with my D3s AF system, there are clearly some capabilities that could not be achieved with a simple one-dimensional AF detection line that is the basis of my original model.

To improve representation of the real system, I have decided to upgrade my model to simulate a two-dimensional array of sensels in the detection line, with staggered columns as suggested by the slanted end-masking seen in the AF sensor photos.

The mask stagger is 15um total across the 120um width of the detection lines. There is still a question of how many columns of sensels lay within that 120um width; I have chosen a number of 5 for the model because it is a good compromise between complexity and convenience of collecting data for the model. This gives a shift of 3um for each column of sensels (a possible direct match to the real detection lines), and if we use the earlier approach of shifting just one test span at a time when performing the calculations, the spatial resolution achieved will be 1.5um.

The Details

The model corresponds to detection lines with sensels that are 24um wide and 15um high (referring to vertical detection lines). The region on the lines that corresponds to each AF point will use an array of sensels that is 5 wide by about 25 high. For performing correlation calculations, the size of the test span will be 16 sensels high, and we will shift it 6 sensels in each direction; the calculations thus will cover a total span of 28 sensels which extends just a bit outside the region for each AF point.

The calculations are performed for each of the 5 columns of sensels independently. That is, each is treated as a separate detection line with regard to correlations, because each column covers different detail. The correlation results are then combined by a moving window which takes 5 values at a time (one from each column) and this yields a total of 121 points on the final correlation plot.

Pros and Cons of the 2D detection line

The approach of using wide staggered sensels, instead of very narrow single sensels, works well for detail at most alignments (angles), but loses the advantage of the stagger in the case where image lines are angled to follow the stagger. In fact, this kind of rotational alignment is one parameter that I will be investigating later. The behavior, though, is often better than the one-dimensional line with narrow sensels, which quickly loses contrast for image lines at most angles.

The large size of the sensels is an advantage for light gathering and signal/noise ratio. However, it can also make very fine details produce rather low-amplitude contrast, i.e., weak signals for the correlation calculations to use.

Data Collection

The original model was intended to be used by taking image samples from the AF module's "screen" that I have installed in place of the AF sensor, so I was only using a 20x66 pixel strip from the camera image. This of course is extremely tiny and results in low resolution, as well as susceptibility to the texture of the AF module screen.

For the 2D model, I am taking image data from a direct camera image instead, selecting an area which corresponds to the AF detection line region for the AF point in use. Using the D800E, this is an image strip of 110x420 pixels, which is then divided into 140 individual-sensel areas. The extraction procedure is more complex, but has been automated to make it practical. It has the advantage that defocus and diffraction effects can be directly set for study when required, not to mention the convenience of being able to use any RAW image file to provide samples.

Following posts will generally make use of the 2D model, but there is one example using images directly from the AF module screen that I would like to present; it will use the 1D model to illustrate a particular AF system susceptibility.
 
This thread will present the stepwise development of a phase-detect autofocus system, using basic optical concepts and ray diagrams. The intent is to lay a solid foundation for the reader, to understand concepts critical to autofocus optics and operation, at a level which is visual, intuitive and readily understandable. There will be some mention of mathematical concepts that apply, but a working knowledge of them is not required in order to follow the discussion and diagrams.

See the following posts for presentations of each step in the development. More posts will be added later, as I have time, and/or in response to questions. The initial posts cover the fundamental optics for the AF system, starting with a single lens, then adding more optics to complete the AF system optical model.

There are many misconceptions associated with AF system behavior, as it is not always intuitive. Some readers may have difficulty accepting the system characteristics described, and ask for supporting references. The best reference I can give, is an optical system that I have sitting on my table right now, configured as detailed in the first few posts: It functions exactly as specified in this thread. I will post some details of that system, and photos of its operation, at a later time (taking photos can be easier than constructing theoretical diagrams, anyhow).

Suffice it to say that this thread will present more than purely theoretical concepts. It is my hope that this will be both fun and educational.
 
Pros and Cons of the 2D detection line

The approach of using wide staggered sensels, instead of very narrow single sensels, works well for detail at most alignments (angles), but loses the advantage of the stagger in the case where image lines are angled to follow the stagger. In fact, this kind of rotational alignment is one parameter that I will be investigating later. The behavior, though, is often better than the one-dimensional line with narrow sensels, which quickly loses contrast for image lines at most angles.
Shouldn't the stagger be mirrored for the left and right sensors so that when the image lines align with the stagger of one sensor they are not aligned with the other sensor?
 
Shouldn't the stagger be mirrored for the left and right sensors so that when the image lines align with the stagger of one sensor they are not aligned with the other sensor?
I'm not sure what you are referring to by "left and right sensors" but mirrored staggers are not an option in any case. Each sensel must align with exactly the same image point that its corresponding sensel in the opposite detection line aligns with, when the camera lens is accurately focused (this example actually shows the camera lens back-focused):




Negative photo of detection lines superimposed on images projected by separator lenses.

The images are not mirrored left/right or top/bottom, so the staggers must have the same direction.

--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Shouldn't the stagger be mirrored for the left and right sensors so that when the image lines align with the stagger of one sensor they are not aligned with the other sensor?
I'm not sure what you are referring to by "left and right sensors" but mirrored staggers are not an option in any case. Each sensel must align with exactly the same image point that its corresponding sensel in the opposite detection line aligns with, when the camera lens is accurately focused (this example actually shows the camera lens back-focused):
I was thinking that only the sensors in the middle of the strip would be exactly aligned when in focus, and the ones to either side would be progressively offset in and out, but it looks like you are right and the staggers match, so it doesn't work the way I was thinking.

Negative photo of detection lines superimposed on images projected by separator lenses.

The images are not mirrored left/right or top/bottom, so the staggers must have the same direction.
Okay, I can see the staggers and it doesn't look like they are mirrored.

Of course, what we are calling staggers might just be the way the routing of the support circuitry works, there might still be just a single row of very thin rectangle sensors. If the sensor spacing meets the Nyquist criteria for the diffraction cut-off of the AF system effective aperture, then there would be no real need to stagger the sensors. We could instead just up-sample the correlation resolution to get the desired accuracy. In fact, I would normally use FFT type fast correlation, in which case we can simply zero-pad the frequency domain data before the IFFT to whatever correlation resolution we want (equivalent to Matlab's ftinterp function). Once we upsample by around 4:1 we can make the final estimate of the location of the minimum (for difference correlation) or maximum (for normal un-inverted correlation) by performing a 3 point quadratic fit to estimate the zero slope position.
--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Last edited:
Of course, what we are calling staggers might just be the way the routing of the support circuitry works, there might still be just a single row of very thin rectangle sensors.
Experiments I've been performing with my D3s show conclusively that there is not simply a single row of thin rectangular sensels; this arrangement is easily defeated by carefully-aligned test patterns, but all of the patterns I've tried have not caused any problems for the camera to find AF lock.
If the sensor spacing meets the Nyquist criteria for the diffraction cut-off of the AF system effective aperture, then there would be no real need to stagger the sensors. We could instead just up-sample the correlation resolution to get the desired accuracy.
Upsampling/interpolation methods can work well when signal quality is high and the data set includes a long spatial span. However, we need to keep in mind that the system is designed to work down to -2EV or even lower, so signals can be very noisy, and we are dealing with a small number of samples.

Since we are looking for precise signal phase relationships between two small image samples, physical oversampling has significant advantages in that it produces a data set that can be used directly without elaborate (i.e., slow) processing.

[Edit: It turns out that the oversampling provided by the suggested 2D model is modest. Estimated Airy disc diameter at the AF sensor is about 9um. With a 3um stagger between the sensels in adjacent rows, the achieved spatial sampling rate is only about 50% beyond Nyquist.]

There are additional studies I have planned, which may shed more light on how the design provides robust operation in difficult conditions.
In fact, I would normally use FFT type fast correlation, in which case we can simply zero-pad the frequency domain data before the IFFT to whatever correlation resolution we want (equivalent to Matlab's ftinterp function). Once we upsample by around 4:1 we can make the final estimate of the location of the minimum (for difference correlation) or maximum (for normal un-inverted correlation) by performing a 3 point quadratic fit to estimate the zero slope position.
I've adopted the simple correlation algorithm described in a Nikon patent, as found by Bernard Delley. The patent does also mention interpolation, which would probably be a quadratic fit around the curve trough, as you described.

--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Last edited:
As photographers, we like to find contrasty, well-defined edges for our cameras to focus on. It's natural to think that the AF system works best with such subjects, and that it would have difficulty focusing accurately with very soft edges.

But is this really true? If we think about how the correlation works - matching two image samples point-by-point - it should be able to match image samples that have gradual contrast transitions, as well as sharp ones. After all, it's digitizing the tones to resolution that should be sensitive to very small differences. A mismatch at gradual transitions will not produce large absolute-difference terms, but then there will be a relatively large number of terms that contribute.

In this example, I've run the 2D model on an image of decorative printing on a pillow, with the image focused well at f/16, and then defocused significantly. On the left is a plot of the detection line values, and on the right, the result plot for the correlation runs. Transitions are sharp for the in-focus case, but much smoother and more gradual in the defocused case. However, the correlation plots are almost indistinguishable, with just a small drop in amplitude for the defocused case:





Clearly, the correlation plot for the blurred image is indicating the shift for best focus just as well as the plot for the in-focus image does.

This is actually an important capability, and demonstrates how the AF system is able to accurately compute focus error when the camera lens is far out of focus and the detail available to the AF sensor is quite blurred. This also covers the relatively small amount of blur caused by diffraction - even with the AF system's very "slow" aperture of about f/28 with respect to diffraction effects.

One may wonder how much blur the AF system can work with. Being the curious sort, I ran some experiments with my D3s, and then discovered a surprising result. As long as the subject contrast included within the selected AF point is high enough, blurred edges allow the camera to lock focus fairly easily; this is in line with expectations, knowing how the correlation works. However, if I used a soft edge with limited contrast range, AF was not possible.

High-pass Filtering

There must be some processing of the detection-line data, to remove constant bias, and even compensate for gradual falloff in the image, across the AF point. This would be high-pass filtering, so that there is effectively an upper limit to the size of detail that can be used for focusing.

This would make sense, to prevent problems when the AF-sensor images start to vignette as can happen when lenses close to the minimum f/8 speed are used. Such vignetting would produce a tonal falloff in the images projected onto the AF sensor, which runs in opposite directions for the two images in each pair.

For example, on the vertical detection lines, the upper image would darken more at the top and the lower image would darken more at the bottom. This creates a mismatch between the images, which could swamp the image details we want the correlation calculations to find - especially if the subject does not have high contrast.

I ran some simulations of this, using the 1D model. Even given a subject with good contrast, I found that adding some tonal falloff (running opposite directions on the opposite detection lines) and image-brightness discrepancy can significantly degrade the correlation plot. Here is an example which uses the same test image that I used in some earlier posts, taken directly from the AF module screen. The upper plot is the original one, without any image vignetting, and the lower plot is the result after some vignetting and bias has been artificially introduced into the data:




Without high-pass filtering of the detection-line data, vignetting of AF-sensor images can corrupt the correlation plots.

The upper curve is clearly indicating the focus shift, but the lower curve has been distorted and has a second trough that could result in complete mis-focus, if the image contrast had been a little lower. Even the original trough at +5 has been shifted slightly to the right, which would reduce focus accuracy. High-pass filtering of the detection-line data will prevent these problems.

In an upcoming thread which investigates AF system capabilities and limits, I will revisit the topic of focusing on soft subjects.



--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Marianne, your right hand plots makes it clear that the correlation function has a negative cusp with equal opposite slopes to both sides. (that is fundamentally no quadratic function near the minimum) This finally explains why the patents refer to that equal slope construct. (Naively one might have expected that the correlation function c(x) is differentiated and the zero of c'(x) located, which would have been a good procedure for a quadratic minimum)

Stray light and many other effects (like vignetting as you show) prevent the correlation function to take its zero minimum value. If we simplify the disturbances as a constant base term, we may do a three point cusp analysis inspired by the patent details.

Suppose pixel with index "0" has the minimal value y0 and the pixel to the right has a value y+ smaller or equal than the pixel to the left which has y- then the cusp must be slightly to the right at a position x (in units of pixel pitch):

x = 1/2 * ( y- - y+)/(y- - y0)

It also interesting to think about how shaky this focus prediction "x" gets when the y values are affected by Poisson noise. Easy to see that for good light the shakiness can be much less than one hundredth of pixel pitch.

The full well of the AF pixels (of course proportional to the width,which remains unknown) might be very high, especially as these pixels are not so small. The highest light level for which AF is specified might give a hint at the exposure time / refresh frequency (kHz ?) .
 
Marianne, your right hand plots makes it clear that the correlation function has a negative cusp with equal opposite slopes to both sides. (that is fundamentally no quadratic function near the minimum)
This is an odd feature. Correlating two signals (one inverted for the difference correlation) does not introduce new frequencies: correlation in the space domain corresponds to multiplication of one spectrum by the conjugate of the other spectrum in the spatial frequency domain, so the highest frequency present doesn't increase. If the original signals are sampled slightly beyond Nyquist, then the correlation plot should not contain aliasing. Without aliasing further upsampling usually generates a smooth plot. Maybe the upsampling technique used to make the high resolution plot caused the sharp null.

Marianne, do you think the upsampling algorithm is responsible for the cusp?

P.S. I didn't look at the patent's upsampling technique details: I already know many excellent upsampling algorithms and don't expect other manufacturer's AF designs to use Nikon's patented technique (I own Canon gear myself). In general the algorithms I use don't generate high frequency details lacking in the source.
 
Last edited:
This is an odd feature. Correlating two signals (one inverted for the difference correlation) does not introduce new frequencies: correlation in the space domain corresponds to multiplication of one spectrum by the conjugate of the other spectrum in the spatial frequency domain, so the highest frequency present doesn't increase. If the original signals are sampled slightly beyond Nyquist, then the correlation plot should not contain aliasing. Without aliasing further upsampling usually generates a smooth plot. Maybe the upsampling technique used to make the high resolution plot caused the sharp null.
The corner is expected, due to the nonlinear abs() function used in the "correlation" calculation.

This also changes the approach needed for interpolation: It is done by finding the intersection of two separate curve fits, to the left and right of the corner.

--
Qualities possessed by God in infinite proportion: Love, Grace, Power, Righteousness, Wisdom, . . .
Qualities possessed by humans in infinite proportion: Ignorance.
- Marianne
 
Last edited:
This is an odd feature. Correlating two signals (one inverted for the difference correlation) does not introduce new frequencies: correlation in the space domain corresponds to multiplication of one spectrum by the conjugate of the other spectrum in the spatial frequency domain, so the highest frequency present doesn't increase. If the original signals are sampled slightly beyond Nyquist, then the correlation plot should not contain aliasing. Without aliasing further upsampling usually generates a smooth plot. Maybe the upsampling technique used to make the high resolution plot caused the sharp null.
The corner is expected, due to the nonlinear abs() function used in the "correlation" calculation.
Thanks for the reply. Yes, I see now that you gave the details earlier in this thread, I was mislead by the "correlation" label. This is not just the correlation of one signal by the negative of the other.
This also changes the approach needed for interpolation: It is done by finding the intersection of two separate curve fits, to the left and right of the corner.
Okay, of course there are other ways that the relative alignment of the two images could be calculated, including true correlation processing.

Anyway, thank you for your efforts on getting to the details of Nikon's AF system. I expect that nearly all of the lessons learned will apply to SLR PD AF sensors in general, across manufacturers.
 

Keyboard shortcuts

Back
Top