Of course it's not detail in any useful sense. It's detail in a pain in the a*se sense, but it is there in the image projected on the sensor, it does not originate in the sensor. It is the consequence of trying to make an image out of whichever number of photons you decide to use. I think you observed elsewhere that the best way to approach picture taking was to determine the exposure for photographic capture and then determine the capture conditions to optimise for that choice. The problem with a large pixel sensor is it predetermines your choice of capture conditions in a way that a small pixel sensor doesn't.The shot noise is a result of limited sampes of photons. It is a counting fluctuation, relative to the long-time average rate of photon arrival integrated over the exposure time (assuming a static scene). In a tonally uniform area of the image, and reasonable photon counts (say more than a hundred) it is to a good approximation white noise. It is not "detail" in any useful sense, since as you note below these fluctuations are mitigated by capturing more photons, while the actual detail remains the same.I prefer the point of view that the smaller pixel image captures more detail, and the photon shot noise is part of that detail. Capturing it is part and parcel of capturing the detail, and it isn't something 'added' by the sensor.But to say that a smaller pixel image image doesn't start off with more noise is simply false. It starts off with more detail, and more noise. One is free to eliminate both, by resampling; or to keep much of the added detail and remove much of the added noise with more sophisticated forms of filtering.
If that's a bit obtuse for some people, what it means in practice is that if there is plenty of light available and you have a pixel dense camera, you can choose to preserve all the detail. If there's less light available, you can choose to jettison the detail and the noise. With a pixel starved camera the former choice is denied you.