I spent a considerable amount of time reading several long posts here and one huge web page on this topic, last night. I am not up to explaining it in one short post. But I can give you some ways that it differs from what you are thinking - which is exactly the way I was thinking.

1. Equivalence means what they say it means. What is equivalent is perspective (you got that right), framing, depth of field, shutter speed and displayed size. Nothing more.
2. It is not about equivalent exposure (light per unit area), it is about total light reaching the sensor. The goal is to measure efficiency of different sensors.

In your pictures you made exposure the same, and (therefore) depth of field was different. That is meaningful, maybe even more intuitive, but it is not what the people you are aiming at mean by equivalence. Once I understood that I was more happy to let them mean whatever they want. I am not yet convinced that thinking that way will be very useful, but it's not totally bogus. They just defined equivalence as they wanted to serve that particular purpose.

While I'm writing:
3. Exposure only means total light hitting the sensor.

No -- exposure is the density of the light hitting the sensor.  The total light hitting the sensor is the product of the exposure and sensor area:

Total Light = Exposure x Sensor Area

The reason the noise in the two photos is essentially the same even though the 5D is collecting 4x as much total light for a given exposure than the GX1 is because the GX1 has a significantly more efficient sensor at those ISOs, which makes the noise about the same.

Everything after that is "brightening". I a really like that, the only problem is general usage. Even the top slider in ACR is labelled "exposure" when it's clearly iso adjust, or brightening.
4. 4/3 needs two stops more open than FF to have same total light on the sensor.

I'd better stop here. There is tons more to read, some of it quite mathematical, to see where the definition they use for "equivalence" came from and how it can be useful. My intent here is just to ask you to allow them their definition, and if it's not what you care about that is OK.

On the quick, the GX1 has a QE of about double the 5D, which means even though 4x as much light falls on the 5D sensor for a given exposure, it only records 2x as much light.  Furthermore, the read noise on the GX1 is about half that of the 5D at the ISO the photos were taken at, so that, in combination with the higher QE, effectively nullifies the noise differential in the shadows.

Right. Damn, I knew that. Careless wording on my part. My point was that what's called exposure occurs only on the sensor.

Indeed.  Same with total light.  However, in terms of the IQ of the photo, we must figure in the sensor efficiency, since it is the total amount of light collected, not merely the total amount of light that falls on the sensor, as well as the additional noise added by the sensor and supporting hardware, that matters.

In short, the same total light will fall on the sensor for equivalent photos, which means the larger sensor will necessarily have a lower exposure, since the exposure is the density of the light falling on the sensor, and the same total light distributed over a larger area will result in a lower density.

If the sensors are equally efficient, then the same total light falling on the sensor will result in the same noise and DR (noise and DR are flip sides of the same coin).

