If you compare a FF sensor and an APS-C sensor with the same amount of pixels, the reason the FF performs better is because each pixel is bigger and captures more photons. That's where the advantage of the FF sensor is.

This is incorrect. The details are explained here:

The link you gave states the same thing:

"While smaller pixels, individually, will be more noisy (for a given exposure and sensor efficiency) because they record less light..."

However, the link begins with:

Most believe that because a larger pixel gathers more light for a given exposure, that larger pixels result in less apparent noise. However, for a given sensor size, the smaller the pixel, the more pixels you have. So, while a larger pixel will have less noise than a smaller pixel since it gathers more light for the same exposure, the image as a whole will be made from the same total amount of light regardless of the number of pixels.

In other words, it is not the light gathered by a single pixel that matters, unless your photo is made from a single pixel. Instead, it is the light gathered by all the pixels that matters.

I specifically said with the same amount of pixels.

It makes no difference either way. For equally efficient sensors, the number of pixels has nothing to do with the total amount of light collected, and the number of pixels has no correlation with sensor efficiency.

Once FF sensors reach the same pixel size as APS-C sensors, that advantage is gone...

In fact, just the opposite. The more pixels, the higher the IQ for any given format. This is clearly demonstrated here:

and is obvious in that the 5D2 has higher IQ than the 5D with its 50% more pixels (albeit a more efficient sensor, as well).

  • nevertheless as is stated in that link too, if you magnify both images to the same degree, the FF one is cleaner, because you can average out over more pixels (either in your brain or in postprocessing if you go down to the same resolution as the APS-C sensor.

A postulate of Equivalence is "same display size" (bold emphasis added):

Equivalent photos are photos of a given scene that share the following five parameters:

  • Perspective

  • Framing

  • DOF

  • Shutter Speed

  • Display Dimensions

It is important to note that the parameters above refer to the visual properties of the photo, but do no include elements of IQ, most notably detail and noise (Noise Equivalence is a related, but separate consideration, and is discussed here, whereas detail depends on the sharpness of the lens, the size of the sensor, the number of pixels, and the AA filter).

