Yes, 1 Is the correct issue, but represents an 'ideal' case as you allude to in 2.Here is my take:Resolution does not matter for colour accuracy, as they are comparing to large uniform patches that are much larger than a pixel. Noise reduction does not help you if your average RGB values don't match the 'correct' ones. A metamarism error is not recoverable in post processing because the full spectrum data has been discarded in the mapping.
You have (essentially) 3 channels of e.g. 14 bits each. The essential amount of information is somewhat less, say (for simplicity more than accuracy) that it is 8 bits per channel. That is, each full color sample (post demosaic) contains 3x8=24 bits of information about color. Now, this representation might not be ideal or conform to any standard representation of color. But through a full 24-bit table look-up we can re-map it. So what is the problem with this method (besides memory requirements, processing time and the issues of estimating the table)?
1. Two different colors might register as the same bit-pattern (I believe that this is the metamerism problem). There is no apparent fix for this, one would have to go for a compromise (e.g. render them as some intermediate color, or setup a rule that makes assumptions about which is most likely to appear in a scene etc).
2. Even if different colors generally result in different bit-patterns, the 3-d mapping could be highly "non-smooth". Two colors that are "close" in the camera representation might be "less close" in the corrected output representation. Or the other way around. This is not so much of a (simplified) theoretical problem, but it might be a large practical problem. With real-world problems such as noise and quantization, stretching the signal representation could make readily visible issues that would otherwise be invisible.
My gut-feeling is that 2) is more of a problem than 1), given the number of bits, level of noise and accuracy of CFAs?
-h
There are two forms to watch out for -
The type in 1, where two visually different colours look the same to the sensor, and where two spots that visually look the same visually look different to the sensor. They are not correctable automatically at all. Quantization errors, etc are not the limitation, it is that the RAW file does not contain ANY information that would tell the software how to shift two colours relative to each other - either closer together or further apart.
This is why cameras don't have 'perfect' colour on the colour check patches under standard lighting. It would be easy enough to design a mapping under controlled conditions to make it come out essentially perfect, but it would probably massively distort the colour quality under other circumstances.
To get more accurate colour you need to either have more eye like CFA filters, or more CFA colours (4 or more different pigments, at the cost of other issues).
The ultimate would be a sensor with tiny pixels that are small enough that they can record the actual wavelength of individual photons as they arrive, and then increment R, G, and B counters for each image pixel by a different quantity that was a function of the wavelength. If they were small enough and the sensor processing fast enough, it would do this as photons arrived and clear it before (on average) the next one arrived.