ON MTF, Shift Invariance and Aliasing

Jack Hogan

Veteran Member
Messages
8,872
Solutions
17
Reaction score
4,220
Location
Toronto, CA
We rely on Spatial Frequency Response measurements to evaluate the hardware capabilities of digital imaging systems. In photography we use SFR interchangeably with Modulation Transfer Function, because the transfer function framework allows us to work with components individually and then combine them easily to obtain the system's response.

However, strictly speaking, MTF assumes a target of sinusoids and Linear Shift Invariant components.

In measurement we fulfil the first requirement by using a good sinusoidal target like a knife edge; and we deal with linearity by using data from a raw file. That leaves shift invariance to deal with.

There are typically three macro components that affect spatial resolution in a digital imaging system: lens, pixel aperture and image sampling.
  1. The lens is not shift invariant but can be considered to be so locally and directionally, so it can be said to have an MTF there
  2. Pixel aperture, the effective physical photosensitive area, is another component with an analog response and similar characteristics, so it generally can be said to have an MTF for a given direction
  3. Sampling in photography is mostly performed by a 2D lattice of delta functions, which is not shift invariant because it loses information on the phase of captured detail. For instance, sub-pixel random detail scattered within the effective area of a single aperture all gets recorded in the very same position, the center of the relative pixel. This loss of phase information is due to undersampling and resuilts in what we call aliasing (hence antialiasing filters). So sampling does not have an MTF.
Raw capture courtesy of Erik Kaffehr
Raw capture courtesy of Erik Kaffehr

The slanted edge method gets around this loss of phase by super-sampling the edge at dozens or hundreds of different phases, allowing us to estimate the combined MTF of the lens and pixel aperture before sampling, which is typically what we are after in our context. Of course it loses phase information and therefore it tells us nothing about aliasing by itself, which must be inferred from other sources of information.

Do you pedants agree with this reasoning and terminology? :-)

Jack

PS In the past I described it like this:
https://www.strollswithmydog.com/resolution-model-digital-cameras-i/#LSI
 
Last edited:
Hi Jack,

I like the write up overall and I'm not one to usually be that pedantic on this topic as in my day job focused largely on DSP we are happy to play fast and loose with terminology under the assumption that practitioners are well aware of the caveats. Though sometimes that burns someone and so I appreciate the need for clarity at times.

So I'll try to put on my pedant hat for you, but I think largely I'll be more commenting on improving clarity than objecting to anything.
In measurement we fulfil the first requirement by using a good sinusoidal target like a knife edge; and we deal with linearity by using data from a raw file.
That's going to be confusing to some. Should probably call out that a true knife edge is a superposition of sinusoidal targets which we can decompose with post-processing. A real world knife edge will have unpredictable high spatial frequency components and thus will potentially give unexpected results if we push things too far into the high spatial frequencies. This is the reason a printed slanted edge target has limitations.
  1. The lens is not shift invariant but can be considered to be so locally and directionally, so it can be said to have an MTF there
Might need to be more specific about what "directionally" means in this context. I believe you are trying to highlight the difference in tangential and sagittal performance and the fact that it is somewhat analogous to how the pixel aperture is also sensitive to "direction".
  1. Pixel aperture, the effective physical photosensitive area, is another component with an analog response and similar characteristics, so it generally can be said to have an MTF for a given direction
"Effective physical photosensitive area" might be ambiguous and misleading. Technically we care both about the aperture shape as well as the distribution of sensitivity across that shape. And that's one way I could interpret that phrase as meaning the joint effect of the microlens and the actual photodiode. On the other hand, of course we usually model the pixel aperture to just be a square with a uniform sensitivity and that model is typically adequate for our purposes and maybe that's the "effective" aperture in this case. Given terms of art like "effective aperture" in antenna design it isn't immediately clear how much abstraction you are intending here and some of the abstractions would actually change the MTF.
  1. Sampling in photography is mostly performed by a 2D lattice of delta functions, which is not shift invariant because it loses information on the phase of captured detail.
I like separating sampling from the pixel aperture and think that is exactly the correct approach. That might confuse some people (probably not the regulars here though) and could potentially use a bit more explanation.
Do you pedants agree with this reasoning and terminology? :-)
You appear to be missing the AA filter in all of this. That's a critical component in many systems and also one likely to be a bit confusing to some people. It is largely translationally independent but has strong directional sensitivity like the pixel aperture. We could lump it into "effective pixel aperture" I suppose, but as I said I find that term already a bit worrisome from the perspective of ambiguity.

Anyway, hopefully that was slightly more useful than annoying! I like the description overall and don't have any real objections to it. Though I'm not likely to be the one to object in the first place.
 
Hi Jack,

I like the write up overall and I'm not one to usually be that pedantic on this topic as in my day job focused largely on DSP we are happy to play fast and loose with terminology under the assumption that practitioners are well aware of the caveats. Though sometimes that burns someone and so I appreciate the need for clarity at times.

So I'll try to put on my pedant hat for you, but I think largely I'll be more commenting on improving clarity than objecting to anything.
In measurement we fulfil the first requirement by using a good sinusoidal target like a knife edge; and we deal with linearity by using data from a raw file.
That's going to be confusing to some. Should probably call out that a true knife edge is a superposition of sinusoidal targets which we can decompose with post-processing. A real world knife edge will have unpredictable high spatial frequency components and thus will potentially give unexpected results if we push things too far into the high spatial frequencies. This is the reason a printed slanted edge target has limitations.
  1. The lens is not shift invariant but can be considered to be so locally and directionally, so it can be said to have an MTF there
Might need to be more specific about what "directionally" means in this context. I believe you are trying to highlight the difference in tangential and sagittal performance and the fact that it is somewhat analogous to how the pixel aperture is also sensitive to "direction".
  1. Pixel aperture, the effective physical photosensitive area, is another component with an analog response and similar characteristics, so it generally can be said to have an MTF for a given direction
"Effective physical photosensitive area" might be ambiguous and misleading. Technically we care both about the aperture shape as well as the distribution of sensitivity across that shape. And that's one way I could interpret that phrase as meaning the joint effect of the microlens and the actual photodiode. On the other hand, of course we usually model the pixel aperture to just be a square with a uniform sensitivity and that model is typically adequate for our purposes and maybe that's the "effective" aperture in this case. Given terms of art like "effective aperture" in antenna design it isn't immediately clear how much abstraction you are intending here and some of the abstractions would actually change the MTF.
  1. Sampling in photography is mostly performed by a 2D lattice of delta functions, which is not shift invariant because it loses information on the phase of captured detail.
I like separating sampling from the pixel aperture and think that is exactly the correct approach. That might confuse some people (probably not the regulars here though) and could potentially use a bit more explanation.
Do you pedants agree with this reasoning and terminology? :-)
You appear to be missing the AA filter in all of this. That's a critical component in many systems and also one likely to be a bit confusing to some people. It is largely translationally independent but has strong directional sensitivity like the pixel aperture. We could lump it into "effective pixel aperture" I suppose, but as I said I find that term already a bit worrisome from the perspective of ambiguity.

Anyway, hopefully that was slightly more useful than annoying! I like the description overall and don't have any real objections to it. Though I'm not likely to be the one to object in the first place.
Thank you Ken, all very good points. I left out the AA not to complicate things, the main objective is to resolve the terminology around the 'pixel' not having an MTF .

Jack
 
Last edited:
Thank you Ken, all very good points. I left out the AA not to complicate things, the main objective is to resolve the terminology around the 'pixel' not having an MTF .
Thank you for the nice definitions, Jack. Personally, I'm still not clear why pixel array does not have MTF.

Just googling for "pixel MTF" brings thousands of references including reputable SPIE and Optica journals, Optipedia, etc. Here are few examples:

https://spie.org/publications/spie-...cs-information/tt52_21_detector_footprint_mtf

https://www.spiedigitallibrary.org/...on-CMOS-image-sensor/10.1117/12.2604851.short

https://www.imagesensors.org/Past Workshops/2003 Workshop/2003 Papers/35 Dutton et al.pdf

Could you explain one more time why do you say that pixels don't have MTF? You've already explained it in a previous link but, sorry, I could not understand it.
 
Could you explain one more time why do you say that pixels don't have MTF?
I'm not Jack, but I'll try what I hope is the very simplest explanation.

An MTF should be translation invariant. What that practically means in the real world is that if I slightly shift the test pattern I'm using to measure the MTF that I still get the same measurement. Nudge the camera up or down or nudge the test chart up or down and I should get the same measurement.

That is not true of a pixel array. If I have a very fine test pattern then when I take a picture of it with a pixel array the pattern I'll see in the final image can change fairly dramatically if I nudge the camera or the test chart. You can sort of see this kind of behavior in the DPR studio comparison scene:

https://www.dpreview.com/reviews/im...1&x=0.29399322482029255&y=0.07473961543202255

The Z7 and Z7II have the same sensor, but you can see the color pattern in the resolution wedge is very different between them. This is just because the two cameras weren't in exactly the same position between measurements. Same story for the D810 and D810A.

So because of this behavior we caution a pixel array doesn't technically meet the requirements of an MTF.

The reality though is that we can work around this with the correct measurement methods. The slanted edge resolution test that is commonly used is one way of working around this, it uses the fact that the slanted edge will intersect different pixels at different offsets as a way to sort of test all possible offsets in one test shot.

Don't know if that helped or not!

--
Ken W
See profile for equipment list
 
Last edited:
Thank you, kenw. Very helpful explanation, indeed. I need some time to digest it.

Risking to be accused of sophystry again, the lens could be seen as a sampling device too, albeit a very large sampling device. That is, if we measure a very low frequency part of MTF comparable to the lens FOV, it starts to depend on the exact phase of this low spatial frequency test picture. Does this make the lens translation-sensitive?
 
Last edited:
Thank you, kenw. Very helpful explanation, indeed. I need some time to digest it.

Risking to be accused of sophystry again, the lens could be seen as a sampling device too, albeit a very large sampling device.
For a one-pixel camera, the lens would influence the windowing.
That is, if we measure a very low frequency part of MTF comparable to the lens FOV, it starts to depend on the exact phase of this low frequency test picture.
Please explain what you're talking about in some detail.
Does this make the lens translation-sensitive?
In sampled data theory terms, the lens is not a sampler.
 
Thank you, kenw. Very helpful explanation, indeed. I need some time to digest it.

Risking to be accused of sophystry again, the lens could be seen as a sampling device too, albeit a very large sampling device. That is, if we measure a very low frequency part of MTF comparable to the lens FOV, it starts to depend on the exact phase of this low spatial frequency test picture. Does this make the lens translation-sensitive?
Hmmm... probably need to be careful here. But I think I see what you are alluding to. If we imagined a test chart with a huge sine wave on it that one cycle of which spanned much larger than the field of view, then indeed how we framed the test chart could affect what we measured in the captured image.

That said, there is already another translational variance that occurs with most all lenses even on smaller scales than the entire FoV. The MTF is not the same at the center of the image as it is on the edges. Jack mentions this in his description where he mentions a lens MTF is only translationally invariant over a small region of the field of view. This is why we see different MTF50 results for the centers and the edges of the lens.

And that kind of points at the crux of why we need to be careful about a pixel array and MTF. With a lens, while we know the MTF varies over the entire field of view, we also know that within a small region the MTF won't change if we slightly shift the test pattern or camera. With a pixel array even the slightest nudge will change how a fine test pattern appears in the final image potentially dramatically (very sharp lens and no AA filter will cause this).

Put another way, the lens MTF varies gradually over the field of view while the pixel array MTF varies dramatically over very short distance scales. They both vary in the real world, but the lens variance is well behaved while the pixel variance is pathological.

If we put a strong enough AA filter in front of the pixel array we reduce this variance, or similarly if we have a soft lens (or one stopped down such that diffraction is blurring things) the lens will reduce the variance we see at the pixel array.

And as mentioned, the slanted edge test commonly used is actually a bit of a clever trick that works around the pixel level variance and actually exploits it to be able to measure MTF at spatial frequencies above the Nyquist frequency of the pixel array. This is entirely dependent on the fact we know in advance what this slanted edge target looks like and the processing software uses that advance knowledge to properly process it and measure an MTF.
 
Agree, I should have added an explanation picture. Suppose, I'm measuring a low spatial frequency part of lens MTF with that test pattern:



5b27eed7d59e4e34ac6f8da167bce1a1.jpg.png

The lens response depends on the phase of this pattern with respect to the lens FOV, right?

So, the low frequency measurement result depends on the translation. Does this make sense?
 
Thank you for the explanation, kenw. I see what you mean.

What do you say about all these SPIE and Optica authors talking about pixel array MTF? Are there two different definitions of MTF? Any other explanation?
 
Agree, I should have added an explanation picture. Suppose, I'm measuring a low spatial frequency part of lens MTF with that test pattern:

5b27eed7d59e4e34ac6f8da167bce1a1.jpg.png

The lens response depends on the phase of this pattern with respect to the lens FOV, right?
For a one-pixel sampler?
So, the low frequency measurement result depends on the translation. Does this make sense?
That's why we measure MTF with slanted edges.

--
 
Thank you for the explanation, kenw. I see what you mean.

What do you say about all these SPIE and Optica authors talking about pixel array MTF? Are there two different definitions of MTF? Any other explanation?
You can simulate the performance of a perfect lens, even one with no diffraction, one a sensor with a known pixel aperture, and do a simulation of a slanted edge test. That will give you the MTF of the pixel aperture.
 
Thank you for the explanation, kenw. I see what you mean.

What do you say about all these SPIE and Optica authors talking about pixel array MTF? Are there two different definitions of MTF? Any other explanation?
In engineering we frequently use terms that aren’t 100% mathematically correct. As Jack is describing in this thread, with some care we can treat the pixel aperture as an MTF and it tells us useful things to do so. In addition, using methods like the slant edge test chart we can even work around the translational variance of a pixel array to produce something very, very like a MTF and combine it with other MTFs in the imaging system to make useful predictions of the overall system performance. By that point the practical difference between this pixel “pseudo-MTF” and a truly translationally invariant MTF is miniscule. Also, everyone in that field understands the underlying assumptions in treating it as a MTF and the caveats associated with it. So there is no reason for them to routinely state those caveats.

Basically, if it walks like a duck, quacks like a duck, flies like a duck, and even tastes like a duck, it is usually safe to treat it as a duck even if it really isn’t. If the duck gets sick and you need to give it medicine maybe then you better remember it isn’t actually a duck.

(Nominating myself for Strained Metaphor of the Year award)
 
Thank you for the explanation, kenw. I see what you mean.

What do you say about all these SPIE and Optica authors talking about pixel array MTF? Are there two different definitions of MTF? Any other explanation?
You can simulate the performance of a perfect lens, even one with no diffraction, one a sensor with a known pixel aperture, and do a simulation of a slanted edge test. That will give you the MTF of the pixel aperture.
Excellent point. And to pull on that thread just a little more, Jack’s presentation of the topic specifically separates the pixel aperture from the pixel array. Which is to say, separates the response function of the aperture from the vagaries of the sampling effects of the physical constraints of making an array. There’s a good reason for doing that.

But that point might be very subtle to those not working with mixed continuous and discrete measurement systems on a regular basis. So I’ve been taking a bit of a short cut in trying to make things hopefully understandable for circa2000 by slightly conflating those two topics in calling the result a “pseudo-MTF”. Engineers do get very fast and loose with terminology and I wouldn’t be surprised if a significant fraction of practitioners don’t even bother to think of the differences.
 
We rely on Spatial Frequency Response measurements to evaluate the hardware capabilities of digital imaging systems. In photography we use SFR interchangeably with Modulation Transfer Function, because the transfer function framework allows us to work with components individually and then combine them easily to obtain the system's response.

However, strictly speaking, MTF assumes a target of sinusoids and Linear Shift Invariant components.

In measurement we fulfil the first requirement by using a good sinusoidal target like a knife edge; and we deal with linearity by using data from a raw file. That leaves shift invariance to deal with.

There are typically three macro components that affect spatial resolution in a digital imaging system: lens, pixel aperture and image sampling.
  1. The lens is not shift invariant but can be considered to be so locally and directionally, so it can be said to have an MTF there
  2. Pixel aperture, the effective physical photosensitive area, is another component with an analog response and similar characteristics, so it generally can be said to have an MTF for a given direction
  3. Sampling in photography is mostly performed by a 2D lattice of delta functions, which is not shift invariant because it loses information on the phase of captured detail. For instance, sub-pixel random detail scattered within the effective area of a single aperture all gets recorded in the very same position, the center of the relative pixel. This loss of phase information is due to undersampling and resuilts in what we call aliasing (hence antialiasing filters). So sampling does not have an MTF.
Raw capture courtesy of Erik Kaffehr
Raw capture courtesy of Erik Kaffehr

The slanted edge method gets around this loss of phase by super-sampling the edge at dozens or hundreds of different phases, allowing us to estimate the combined MTF of the lens and pixel aperture before sampling, which is typically what we are after in our context. Of course it loses phase information and therefore it tells us nothing about aliasing by itself, which must be inferred from other sources of information.

Do you pedants agree with this reasoning and terminology? :-)
Let us get a bit "kinky" by considering PSFs of fish-shaped (and sundry) photosite apertures.
 
For a one-pixel sampler?
Not one pixel. Just for the sake of discussion, let's assume giga-pixel sampler.
So, the low frequency measurement result depends on the translation. Does this make sense?
That's why we measure MTF with slanted edges.
Yes, indeed, the high frequency portion of MTF is entirely invariant, both in slanted edge and repeating bar approaches.

However, very low frequency part of the slant Fourier transform should be dependent on the edge location with respect to the lens FOV, such as in the examples below:



32a3b81b24ec4c9487da55efbfcf4b6d.jpg.png
 
For a one-pixel sampler?
Not one pixel. Just for the sake of discussion, let's assume giga-pixel sampler.
So, the low frequency measurement result depends on the translation. Does this make sense?
That's why we measure MTF with slanted edges.
Yes, indeed, the high frequency portion of MTF is entirely invariant, both in slanted edge and repeating bar approaches.

However, very low frequency part of the slant Fourier transform should be dependent on the edge location with respect to the lens FOV, such as in the examples below:

32a3b81b24ec4c9487da55efbfcf4b6d.jpg.png
Are you assuming Burns processing of the data? Looks like you’ve made the edge half as long. Assuming that the lens resolution is the same across the field, I don’t think that will change the slanted edge MTF.

--
 
Could you explain one more time why do you say that pixels don't have MTF?
Hi circa2000,

Ken has done a good job reading between the lines and explaining my post.

On one hand it is true that the 'pixel' is not shift invariant and hence technically we cannot apply the useful transfer function framework to it (i.e. it does not have an MTF so there is no System MTF where a 'pixel' is involved). JACS explains this well here .

On the other we can measure and model rather accurately via the framework the low pass function that the 'pixel' provides (a reduction in resolution), which is reflected in the relative blurriness of the captured image. For instance we can estimate effective pixel aperture in the given direction: what is it in the image in the OP, assuming a perfectly square effective active area? What does it say about the intent of the sensor designers? See here for some suppositions.

Hence the desire to be formally correct while reflecting the substance of what we see. I propose that we can achieve this objective by splitting 'pixel' in two separate functions: 'effective pixel aperture', which falls in the same analog/continuous LSI category as a lens, so they have an MTF; and the sampling delta trains, which do not.

With this semantic distinction and the group's validation, I would then feel comfortable calling measurements or simulations such as in the OP image a System MTF. It is of course the System MTF of the lens and pixel aperture just before sampling, which almost inevitably introduces aliasing, as we know.

Jack
 
Last edited:
Yes, indeed, the high frequency portion of MTF is entirely invariant, both in slanted edge and repeating bar approaches.

However, very low frequency part of the slant Fourier transform should be dependent on the edge location with respect to the lens FOV, such as in the examples below:

32a3b81b24ec4c9487da55efbfcf4b6d.jpg.png
If I understand correctly your point, the slanted edge method measures the Spatial Frequency Response of the imaging system averaged over the length of the edge, in the direction perpendicular to it (alas, it says nothing about aliasing).

Since the performance of the lens changes slowly but substantially (e.g. the center vs the corner) results are valid only locally. That's why it is generally a good idea to limit the length of the edge to be evaluated (a couple of hundred pixels is a good compromise with noise).

The FT is computed on the profile of the edge only (the LSF), usually perpendicularly within 4-32 pixels from it. Should you be interested there is an introductory article on the method here:

https://www.strollswithmydog.com/the-slanted-edge-method/

Jack
 
Last edited:

Keyboard shortcuts

Back
Top