From Fourier optics we know the max contrast is going to be in the center. So the MTF is scaled by the contrast at the center which is way the output plotted on the y-axis ranges from 0 to 1. While it measures the impact of many distortions caused by aberrations of the lens, for the most part the primary thing it measures is field curvature. A lens transforms a spherically far field wave radially wave into a collimated beam of (parallel rays) of light the MTF measures how way it does that at a distance from the center out.
What the MTF does not tell you is the center contrast at one frequency vs. another as 15 lp/mm vs. 45 lp/mm. That is how does the lens attenuate as a function of frequency. Although it seems that Fujifilms makes an attempt to supply it at least at two points. What the MTF does not measure is how the lens renders out of focus planes in front and/or behind the focal plane. It is produced with the illuminated chart perpendicular to the lens axis.However, in the real world a lens focuses light from a three dimensional space onto a plane. It is in actuality a projective transform when used in the real world - no the chart in the lab world. That all point sources of light on a line will end up at the same point on the sensor.
The design choices that produce the best MTF may not be the best choice for rendering of the the light point sources in out of focus areas and the transitional areas between the focal plane and the out of focus areas. Normally the rendering in these areas are lumped together under the term Bokeh. This rendering can result in smooth transitions, choppy transitions (usually called nervous Bokeh), or down right ugly such as onion rings, cat's eyes, or sharp circles around out of focus point sources.
The clinically sharp lens is only about the one infinitesimally thin plane perpendicular to the lens not the projective rendering of the volume of dispersed light in the cone defined by the field of view. The goal of designing a lens to get the best MTF involves trades that impact Bokeh or the volumetric rendering.
A pretty good description of rendering of point sources of the focal plane.
https://jtra.cz/stuff/essays/bokeh/index.html
A descent lay discussion.
https://www.bhphotovideo.com/explora/photography/tips-and-solutions/understanding-bokeh
https://www.bhphotovideo.com/explor...ical-anomalies-and-lens-corrections-explained
Ray tracing is the bread a butter tool for lens design today. Modern high speed computers allow very complex designs to be undertaken. However, geometric optics which is the basing of ray tracing is not adequate to accurate analyze how a lens renders light off the focal plane. However, it becomes a simple task using the fact that Fraunhofer diffraction shows that a lens is in fact Fourier transform and that allows one to perform analysis in the off focal plane volume of space where the bokeh lives and some say "character" lives.
https://web.mit.edu/2.710/Fall06/2.710-wk10-a-sl.pdf
The 50 f2 is a clinically sharp plainer designed lens. The plainer design emphasizes a flat field is based on the double Gaussian lens. The other classic Zeiss design was the Sonnar design.
These two designs are the starting point of most modern lenses.
So what clinically sharp gets you is highest resolution in the focal plane and the flattest field ( or sharpness in the corners) at the expense of the out rendering the out of focus areas. The beautiful rendering of the Sonnar lens design is the trade off of having some field curvature. Or as Carl Zeiss put it the eye is as important as mathematics in designing a camera lens. Interesting reading
https://www.pencilofrays.com/double-gauss-sonnar-comparison/
The double Gaussian Plainer lenses tend to be sharper while the Sonnar lenses produce more pleasing Bokeh and bette "character." Of course character is a subjective term based on how pleasing a person sees the rendering of the image.
In my view, the Fuji 50 f2 is a classic example of a highly corrected flat field design which is very sharp. It's a nice lens for I would not use it to produce a flattering portrait. It lacks in character compared to the 56 f1.2 and 50 f1.
Now others may have different objective views of the three lenses. You asked what I meant - so here it is. YMMV