Photon Shot Noise is a property of the light itself (not the detector), and thus exists even in the case of a single photo-detector (photo-site). It exists on the level of a single isolated photo-site.
... What I missed was a straight forward explanation of how these things interact with ISO gain multipliers. Basically the lack of understanding was on my side (thus I used that expression before), but everyone here assumed things - partly contradicting ones - without going into detail why one leads to the other.
You mentioned that you have some experience in the subject of audio engineering.
ISO Gain is just the amplification-factor ("gain") of a signal (usually a voltage signal in these cases).
Audio amplifiers pass and sometimes amplify audio signals of interest. The transistors, diodes, and resistors that the amplifier circuits are made out of have their own internal "noise-sources". "Shot noise" (in semiconductors, not in light itself in this particular case) exists in the semiconductors (transistors and diodes).
(Vectorally) summing all of the various noise-sources present in an audio amplifier (which include other types of noise in addition to semiconductor shot-noise) in a way that represents the (total) noise (as an input-referred quantity of noise) allows us to determine the audio amplifier's Dynamic Range (the ratio of the maximum peak input signal that the amplifier can pass without non-linear distortion divided by the root-mean-square value of tht etoal input-referred noise), or the audio amplifier's Signal/Noise Ratio (ratio of the maximum root-mean-square that the amplifier can pass without non-linear distortion divided by the root-mean-square of the total input-referred noise).
The photo-sites are just "transducers" (not unlike microphones or turntable or guitar pickups). They "transduce" (translate) photonic energy into electrical charge (electrons) that is stored in the photo-cell of a photo-site just like the way that a capacitor stores units of charge (electrons).
There is noise in the light itself that (only in the 1980s) has been proved to be a characteristic of the light itself (and not the transducer). (Sort of) similarly, Brownian motion of molecules in the air generates sound-pressure-waves (noise) that are picked up by even the finest microphones - becaues it is truly there.
As you can see from DxOMark's data, the SNR of photo-cells is on the order of only around 40 dB.
The MOSFET amplifiers that interface to the photo-cells with each MOS image-sensor photo-site have their own noise (which is on the order of around -80 dB below the voltage signal level that they can roughly linearly process). They also add more (summed) capacitance to the photo-cells that they interface to.
bobn2 tells us that it is that (input) capacitance of the MOSFET amplifiers which typically "dominates" over the capactiance of the photo-cells themselves.
Basically the temporal distribution of photons arrival can lead to a situation where ... half the photons may (or may not) arrive within half the time. Once the sum of these photons is linearly multiplied (linear analog gain increase) you may get higher raw levels due to the gain/multiplication process.
Or even more down to earth and very simplified math: When you get 10 photons at ISO 200 then you don't necessarily get 5 photons at ISO 400 within half the exposure time. 10 (true signal) photons at ISO 200 may multiply to raw level 2000, but at ISO 400 and half the time you may get something like 6 photons that would multiply to raw level 2400.
"Noise" is an uncertainty of measurement that (relative to the "Signal") decreases with the
square-root of the number of measurements
when what we are measuring is random in nature .
The (complete) "signal" is (actually) the (desired) "signal" with the (undeired) "noise" (also) added to that "signal". That extra "noise" that "rides" on top of the signal is only really noticable when the ratio of the signal divided by the noise is smaller - such as when the SNR is lower, as in the case of a lower magnitude of "light signal".
Signal/Noise Ratio is (actually, most rigorously) = (Signal + Noise) / (Noise)
The numerator of the above formula is the part that we are (casually) describing as "signal" only. As the ratio of the Signal divided by the Noise
decreases (in lower light-levels), the "Signal + Noise" does not decrease quite as much as we would think - because there is Noise adding to that Signal. When amplifiers amplify all of this (such as at higher ISO Gains) we see the effect more prevalently.
The amplification that takes place with higher "ISO Gains" multiplies (amplifies) both the "signal" as well noise - in equal geometric amounts.
Why does a single photo-site/pixel get a higher maximum value when it's exposed for a shorter time/less photons than when it is exposed for a longer duration/more photons?
Because on a temporal (over time) basis, it is the ratio of the collected (summed, integrated) electron chrages resulting from arriving photons (signal) divided by the uncertainty of the (noise) that matters (Signal/Noise Ratio).
As a larger number of photons are summed over time, and the uncertainty (of photon shot noise) increases only by the square-root of that number summed, the ratio is higher when the exposure-time is longer - and the ratio is lower when the exposure-time is shorter.
Yes, that square-root (1.41xyz) connection has been mentioned in several posts, ...
It's just the
square-root function. The Photon Shot Noise varies by that relationship as the amount of light (or the intensity of light per unit area, or the total amount of light transduced by an image-sensor), varies. The
square-root of 2 is 1.414,
square-root of 4 is 2,
square-root of 16 is 4, etc. ...