I experience HUGE exposure differences with apertures either side
of f/4. Your testing only goes to f/4 at the widest. Try the same
trick with a lens at f/1.4 and tell me there's no problem.

The odd thing for me is that I find the exposures to be pretty
close when stopped down smaller than f/4, and I get underexposure
anywhere larger than f/4. This seems to be different to other
people in that they claim their exposures are good wide open and
then get progressively overexposed when stopped down.

The basic behaviour is the same.
It is not so much that you get overexposure above f4 or underexposure below f4.

The main point is, that the METERING IS NOT LINEAR with the Green Button!

The metering basically goes (very roughly) like this:
(the last number indicating the amount of overexposure in EV)

1.4 ... 0
2.0 ... 0
2.8 ... 0
4.0 ... 2
5.6 ... 2
8.0 ... 2
11 ... 2
16 ... 2
22 ... 1

You can add whatever EV constant to this row, like:

1.4 ... -2
2.0 ... -2
2.8 ... -2
4.0 ... 0
5.6 ... 0
8.0 ... 0
11 ... 0
16 ... 0
22 ... -1

It depends on the subject and light what you will experience, but the shape of the curve (darker exposure at large apertures and lighter exposure at higher apertures) will be the same always.

The question is, why is there this break at around f4?

The green channel is the most characteristic:

