Your hypothesis of less DR per unit of sensor area is based on
smaller photosites saturating faster and I think that is your error.
For a given sensor technology there is no reason for smaller
photosites to saturate faster for a given exposure than larger ones
as saturation will occur at a particular electron density that will
be the same for each size of photosite. IE: Does a small rain gauge
fill faster than a large one?
Let's consider two different pixel sizes, A and B, such that A has
twice the dimensions of B, and thus four times the area.
Let's say that pixel B becomes saturated after 50,000 photons. Then,
given the same sensor efficiency, pixel A will become saturated at
200,000 photons. Now lets assume that four B pixels are arranged in
a square, so that they look as if an A pixel were divided into four.
Let 200,000 photons fall onto pixel A. It will record all those
photons. Now let 200,000 photons fall onto the "4-pack" of B pixels
(B1, B2, B3, and B4). Now, if the photons are uniformly distributed,
then 50,000 photons will fall into each of the B Pixels and all is
the same. However, let's say the distribution of photos is not
uniform and 80,000 photons fall onto B1, only 40,000 photons fall
onto B2, another 60,000 photons fall onto B3, and only 20,000 pixels
fall onto Pixel B4.
We see that the A Pixel records 200,000 pixels, but the 4-pack of B
pixels only record 160,000 photons, since two of the pixels are
oversaturated. Thus, the single A pixel records the DR more
accurately than the 4-pack of B pixels.