Reverse thinking :
the position of a point in the plane can be unequivocally identified by triangulation.
In RGB, if RGB is stacked, we need 3 pix x3, if not (bayer) 9 pix, 3 each RGB, irregularly spaced.
Color depht of this point is univocal in Foveon, as well as in bayer, so we are using the same amount of pixels.
If we think of the larger Jpeg not as an expansion of the Raw size, but as a larger frame where we are going to place the positions of the smaller Raw, our precision in doing so is higher obviously with Foveon, as long as the information needed doesn't exceed what is available in Raw.
So for simpler pictures, portraits etc, even the double mode could be at an advantage against bayer for the same count (3.4x3 vs 10.2) even for prints over A4 (7.7mp at 314ppi).
When the information needed raises, say a very detailed urban landscape, then bayer may lead (10.2 vs 3.4x3)because the spatial info needed is higher.
IMHO there is no simple way to compare the resolution of the two systems.
10.2mp bayer (2.55R 5.1G 2.55B) is "perfect" if resized at 2.55mp,
10.2mp foveon (3.4 each RGB) is "perfect" at 3.4
14mp foveon (4.65 each RGB) is "perfect" at 4.65
Over such sizes, interpolation is needed, and bayer can be b+w "perfect" at 10.2mp, but B+W !