5 years ago state-of-the-art was about 3e- read noise independant of pixel size. Today it is about 2 e-, again independant of pixel size. Which sensors with 1.4 micron pixels have better read noise than a D3s? The D3s has about 2.8 e- read noise so which are better than 2.8*(1.4/8.4) less than 1/2 e- in read noise?
At these levels, does it really matter?
Depending on the method used to Bayer-interpolate (mainly the support area size chosen) you get very different results from the two options.
One - large pixels - will have better SNR per pixel, as Qe * area increases without any noticeable increase in total RN.
The other - small pixels - will have a much better interpolation support, since you can increase the support area (counted in pixel widths, not µm of course!) without really affecting the available resolution in the finished image.
The difference this makes to the raw converter and the rest of the PP chain is not trivial. The larger pixels will GENERALLY have a better Bayer interpolation accuracy per pixel (the two missing colors will be more accurately estimated), but also a MUCH higher susceptibility to impulse noise, or just the unfortunate pixels where the photon statistics go totally off the chart in predictability (it has to happen in quite a few tens of thousands of pixels in a 20MP+ raw file...)
They each give very different "qualities" to the finished image, and I must say that i prefer the downsampled high-res image by quite a large margin. The noise grain, especially the Chroma noise, is a lot "tighter", which has two very tangible effects in post: You can apply more NR without affecting image detail, and you don't NEED to apply as much NR - as the tighter noise grain is a lot less "digital", and hence also causes less objectionable responses in the human visual recognition system.
Finer pitched noise grain can be stronger in total P-P oscillation than coarse grain noise (which means that it will have a much higher total energy content), and you will
still accept it as a part of the image rather than a digital artefact.
So it's not just about the numbers, there's an entire path of interpolations and convolutions going on before the image ends up on your screen or in your print - at the intended viewing size. The lowered SNR per pixel may actually end up as an increased perceived SNR / detail in the finished image.