Think of a full frame sensor behind a lens of certain focal length and aperture. Then crop the resulting image to the format of the "equivalent" systems sensor.

To stay with your original setup, think of a 35-100/2.8 lens on a FF sensor, crop to the size of a mFT sensor, watch equivalence reveal itself.

Indeed.  If we took a photo at 100mm f/2.8 on FF, and cropped it to the same framing as a photo taken at 100mm f/2.8 on mFT, the photos would be equivalent (although the mFT photo would be more detailed since the sensor has a higher pixel density and the lens is sharper).

Covers everything but diffraction nicely. Add some more pixels and it also covers diffraction

Actually, the effects of diffraction softening are also included, and the pixel size has nothing to do with it:  the proportion of the photo that the Airy Disk takes up when the FF photo is cropped by a factor of two also doubles (as does the DOF when the photo is displayed at the original size).

That is, if we took two photos with FF, one at 200mm f/5.6, the other at 100mm f/2.8, but cropped the 100mm f/2.8 photo to the same framing as the 200mm f/5.6 photo, and displayed them at the same size, the amount of diffraction softening would be the same, as would the DOFs.

