A) Why 3 detectors? Why not 2 or 19? What difference does it make?
In layman's terms you need at least three channels to simulate eye color response. There's such thing called metamerism (if you see same color for different spectral compositions, then sensor must detect same color for these spectras either and
vice versa) and based on that it is assumed that for proper color reproduction eye spectral response must be representable as linear combination of sensor spectral responses. There can be more than three channels, but this makes both sensor technology and processing more expensive. There are some sensors with four different color channels produced (RGBE from Sony for example), but these sensors are abandoned due to processing difficulties.
What difference more channels can make? Maybe a bit better color reproduction, maybe wider operating conditions (from candlelight to very high color temperature scenes or while using lapms with weird spectra) - not much for ordinary photography. For scientific or forensic uses such sensors would be more interesting - but using external filters is usually much better and more flexible solution.
B) If 3 detectors are the minimum (and this is presumably why Foveon has 3 layers), how does the Quattro work: It has asymmetric layers with each top layer detector sharing the lower layers with its neighbours.If you take a group of 4 top layer pixels then the top layer could yield different values for each detector but the two lower layers would be the same for each pixel. A pixel that allows only the top layer to vary within each group of 4 doesn't sound as if it would actually work very well....yet it does...
It does sound to me as if it would work quite well. You need to consider, that a) human eye doesn't separate little details color very well and b) in real images color channels are strongly correlated. (Both of these are basis of CFA sensor processing too.)
I would process Q data as follows (noise removal and many other corrections omitted):
- group/bin top layer pixels 2x2
- calculate pixel values (HSV or Lab or RGB + Y - in whatever color space it is most correct to process next steps) from resulting 5MPix three-layer data as usual
- resize result to 20MPix
- redistribute intensity (luminance, lightness - let everyone choose correct term here, I don't know) in these 2x2 areas according to top layer real pixel values
I think such approach would create relatively few visible artefacts for most images, even without residual color fringe correction. Is someone willing to try that approach on real data?