Let me first of state, Im not an expert and Im genuinely curious.
But the things I have noticed.
Sony leap frogged everyone with hybrid cdaf pdaf
Fuji have just caught up
And Oly M43 produce a beautiful new omd m10iii but NO pdaf or hybrid focus system. And still 16Mp.
This is a question that will find the best answers in the Photographic Science forum.
MIrrorless cameras started out by using contrast detection autofocus, CDAF. This system involves continuous readout of the image sensor and an iterative process whereby the focus of the lens is racked back and forth until the contrast of the image peaks. Obviously, the time it takes to find that peak depends on how fast the captured image can be converted into a digital file, whether or not a portion or the entire frame is assessed, the precision with which the pixel data used for focusing is converted (AF doesn't require full image-rendering precision), how fast the lens can respond to focusing commands, how clean the signal is from the sensor, and stability of the image on the frame. There are other factors. The usual CDAF focusing process takes several iterations to complete.
In recent years manufacturers have figured out how to implement a form of phase detection autofocus, (PDAF), on the sensor itself. PDAF is basically an automating of the focusing technique used in rangefinder cameras, in which the image is split into two parts and then projected onto a ground glass. The separation between the split images is directly related to the quality of focus, and in rangefinders the eye was used to overlay them. Modern SLR versions of this technique project the images onto a dedicated line-sensor and an autocorrelation is performed to determine where to focus the lens. Note that PDAF generates both a distance and direction signal from an extremely small amount of data, so the lens theoretically requires no hunting to acquire focus, making it extremely fast and very effective at tracking moving subjects. CDAF generates only a "distance" signal, so has to guess initially which way to go, making it slower.
On-sensor PDAF, OSPDAF, masks pixels on the imaging sensor so that they look in opposite directions. There are various ways that this masking can occur: on every pixel (Canon's DPAF), and in sparser arrays (Sony and others). Which method is used is a design decision, as there are tradeoffs to both general techniques, but the technque does deliver the distance and direction signals that make for much faster focusing than CDAF.
In general, however, OSPDAF has several fundamental limitations with respect to SLR-type PDAF that have restricted its ability to focus reliably in low light conditions and in situations where extreme discrimination of the subject from the environment is required. This has led to the development of hybrid on-sensor AF systems - systems that combine PDAF, CDAF, and subject feature detection and tracking. Panasonic's DFD is an interesting mashup in that it is a CDAF system that knows the defocusing characteristics of the lenses attached to it (Panasonic only, of course), so can from only a couple of iterations determine where proper focus is much faster than a pure CDAF system, but still slower than PDAF.
Lens mechanical focusing speed aside - as it is an important factor for both systems - the thing that makes both systems faster is the ability to get image data off of the sensor and processed FAST. Typical sensors convert image signals one row at a time, then ship it off in serial fashion to a separate image processor chip. Sony, with its stacked sensor, RAM, and processor chips, is able to process data in parallel, which gives it its speed and subject tracking prowess.
In summary, then, the improving speed of mirrorless AF systems has been due to a steady progression of readout speed, algorithmic scope and efficiency, and lens mechanicals that have now put them - in the best mirrorless cameras - within spitting distance of DSLR PDAF system performance, and in certain situations actually exceeding it. The cost has been electronic circuit complexity and power dissipation.
So to answer your question, why doesn't your EM10iii focus like an A9 - or even an EM-1? The reason is cost and market targeting. The EM10iii is an entry level camera and therefore doesn't include the sophisticated sensor architecture and processing chips that would be required to give you that performance.