ejmartin
•

Veteran Member
•
Posts: 6,274

A little sampling statistics tutorial

1

Much of image noise is "photon shot noise", the fluctuations in the number of photons reaching a given patch of sensor in a given amount of time. It is not too difficult to understand the nature of these fluctuations, they are rather ubiquitous in sampling problems.

Consider the following problem (which we'll relate to photography a bit later). A coin is tossed many times and the sequence of results is recorded -- 1 for heads, 0 for tails. Now ask, in any sub-sequence of N consecutive tosses, how many were heads? In the limit of many, many tosses, the answer of course is half of them, N/2, are heads. As the number of tosses gets very, very large, the probability that the answer is something other than half the number of tosses being heads becomes vanishingly small.

In the opposite limit, a single toss, it's either heads or it isn't, zero or one. Either, way, rather far from half the number of tosses, which is 1/2.

In between, we can look at the collection of all subsamples of N consecutive tosses, and ask what is the fluctuation in the "head count". The standard deviation of the head count of N tosses is roughly sqrt[N]
2; that is, most of the samples give a head count within sqrt[N]
2 of the average N/2. For 100 tosses, most of the time the result for the head count will be between 45 and 55.

So in a stream of coin tosses, half are heads, but as we sample the stream more finely, we see local fluctuations in the number of heads that goes like 1/sqrt[sample size] -- the smaller the sample size, the more the fluctuations in the head count are likely to be.

Photon counting statistics works the same way, with minor modifications, because both the coin toss problem and the photon counting problem tend to the same probability distribution (the normal distribution) in the limit of a large number of photons counted, or coins tossed.

As one makes pixels smaller, for a given fixed illumination of the sensor, the photons striking the sensor are grouped into smaller collections or samples. The fluctuations in the sample count are approximately 1/sqrt[average count], just like in the coin toss problem. So as pixel sizes decrease, the number of photons of the total that they are sampling goes down, and the relative fluctuation (which we perceive as noise) goes up.

But nothing about the photons has changed, what has changed is our sampling of those photons; necessarily the smaller samples have more fluctuations, and so necessarily increased resolution brings with it increased noise at the level of individual samples -- the pixel level -- even though nothing about the image being recorded is different. And one can recover the effect of larger pixels simply by grouping the smaller samples together into larger samples; this is what binning or downsampling does. Inexorably, the fluctuations will decrease with the larger sample size. But note that one doesn't need to do the binning to have the noise-lowering effect of larger sample size -- it's already there in the recorded data, in the same way that in the coin toss problem, the sequence of tosses is fixed and only our grouping of them into larger samples affects the fluctuations in the head count.

Oh, and by the way, the next time you hear the results of a poll, and it is said to be accurate to +
- 3 percent, you can figure that they asked about a thousand people (sqrt[1000]
1000~.03).