I applaud the OP's desire to hypothesize an alternative technique and experimentally test it, but there's more work to be done, more things to consider, before the conclusion would be valid and technique useful.

What's not being addressed by the technique is how many pixels the subject occupies, and how much resolution results, at the final magnification/print/display size.

Looking at it first from the perspective of number of pixels, the "gold standard" for a top-quality print is 300-360 pixels per inch. So if one backs off on the distance so that magnification ratio is 1:5 rather than 1:1, and the item of interest now occupies (at most) only 1/5 of the linear dimension of the frame, then even on a A7Riv, it now only takes up 9504 ÷ 5 = 1900 pixels. Printed at 300 ppi, that's a print with a long dimension of only 6.34 inches.

Similarly, looking at it from the resolution perspective, an optical system (lens and sensor) has (at any magnification ratio and MTF threshold) a maximum resolution in line-pairs per millimeter. If, because one has backed off, the subject now occupies only one-fifth of the linear dimension on the sensor it would have occupied if photographed closer, then it probably spans fewer line-pairs of result. (Unlike number of pixels, this isn't a straight-forward calculation, because lens resolution will vary at different magnifications and will depend on what the lens is optimized for. But even using lenses optimized for each of those magnifications, I believe that the number of line-pairs spanning the subject photographed will almost certainly be significantly lower if one reduces on-sensor magnification by a factor of 5.)

For a proper comparison against focus stacking, we would also need to know (either theoretically or experimentally) how much resolution may be lost via the stacking process, which I don't know, and the OP hasn't given any indication that he/she knows either.

There's no free lunch.

