My
understanding of the term binning is combining detector data from
the wells at the hardware level.
Yes, the two adjacent sensor locations are combined at the HW
level. "The size and configuration of a pixel group are
variable—2x2, 4x4, 1x2, etc.—and are controlled through
sophisticated circuitry integrated into Foveon X3 direct image
sensors." Note they explicitly mention 1x2.
While interpolation is done at the software (or possibly firmware) level.
In this case SPP or Foveon libraries in other code. (dcraw does not
handle medium resolution.)
They are two very different things.
Yes they are, but that doesn't mean you can't use both in the same
mode.
The only objection I have to your explanation is the use of the
term pixel. In the context you are using it I would rather see the
term data point used.
I'm using the terms as Foveon uses them, e.g. "The VPS capability
allows signals from adjacent pixels to be combined into groups and
read as one larger pixel."