Photosite (or even pixel) size has very little connection to sensor efficiency and therefore low light performance. The pixels might be bigger, but you have fewer of them, so in the end the same amount of light gets collected.

There are two different things here. There is photon count per pixel and number of photons collected per unit area.

Those are indeed two different things.

The latter does not depend on the pixel pitch but the former does. And SNR per pixel gets larger with more photons collected by that pixel (photon shot-noise per photon gets weaker). For the same exposure larger pixels capture more photos and, thus, have higher SNR. This is why pixel peeping reveals more noise-per-pixel for smaller photosites.

Indeed, bt that is of little relevance to what we are actually trying to do in photography, which is make a picture that we can look at.

The price to pay is reduced resolution.

The resolution is identical, if you look at individual pixels, because a pixel just describes the value of light where it is. There is only 'resolution' when you look at an area, and if you want to compare 'resolution' it makes sense to compare the same area (or equivalent areas when magniified to the size of the final image). So, the bottom line is that 'resolution' makes no sense at the pixel level, and nor, in terms of image quality, does the SNR.

With proper down-sampling (bicubic, etc) to the same level of detail one can hope to recover the SNR back by effectively combining outputs of multiple smaller pixels into an aggregate one but doing so does not entirely compensate for read-noise increase.

The 'downsampling' argument is a red herring. All that is required is to look at the images produced the same size.

Correct me if I am wrong, but I think that you will have to jump through several hoops to match 4x4 um pixel SNR to 6x6 um one.

You cannot measure SNR in a single pixel in a single photograph. Now think on the implications of that.

