But what about the stuff that’s actually in focus?
Camera manufacturers typically use anything with a CoC of 0.02 mm in focus. If you focus so the the CoC is at or less than the pixel pitch, then you can’t get any sharper, but you’ll have a shallower DoF.
The key takeaway from this is visually in focus to a human with 20/20 vision, results in a point source of light covering more than one pixel.
The camera manufacturers don't choose a circle of confusion:
you do. The circle of confusion is the
largest out-of-focus blur size that is invisible to
you under specific viewing conditions. But it is
out of focus blur and it most definitely does not refer to anything that is sharply in focus. CoC only refers to things that are blurry because they are out of focus.
Also, I'm not sure you'd want a point light source only focused on precisely one pixel, especially with a Bayer camera. Things are so much easier to deal with if you have oversampling.
The depth of field equations assume a perfect lens and perfect sensor with no aberrations and the ability to resolve details to an infinitely fine degree. But even an ideal lens with a non-zero aperture size will have out-of-focus blur, and this blur from an ideal lens is what depth of field measures. Under this ideal, at the plane of focus the detail is perfectly sharp, but objects before and after the plane of focus will be blurred by and amount proportional to the distance perpendicular from the focus plane. By convention, we measure the width of the out-of-focus blur circle as it is measured on the image projected on the sensor, and we call that the circle of confusion.
It should be clear that there is no way a camera maker can 'choose' a value for the CoC since there is nothing to choose. The degree to which something is blurred, and if that blur matters at all, is strongly conditioned on your decisions and how you look at the image. If you look closely at an image, the circle of confusion gets smaller, and as you stand farther back it gets larger. People with excellent eyesight have smaller circles of confusion that people with poor eyesight.
Since the equations are based on an ideal, the usefulness of the depth of field equations is only as a limit: they tell you the best possible results that you can get; particularly, they will tell you the minimum depth of field that you can achieve under standard viewing conditions. Real lenses and real cameras tend to increase perceived depth of field over the ideal. As they say "sharp lenses have a smaller depth of field".
Bringing pixels into a depth of field discussion is problematic because the existence of visible pixels tends to destroy the illusion of out-of-focus blur and even the very notion of depth of field, which after all is based on an ideal lens capable of resolving infinite detail at perfect sharpness.
You can't see the actual pixels in a digital image, as they are only geometric points, an array of abstract numbers, and so they have to be rendered on a physical medium, and how those points get rendered is important. If you have big square pixels on an output device, you need to stand back far enough so that you can't see them, in order to restore the illusion of an image. If you use a smooth upsampling algorithm, and if you peer very closely at an image, then everything will look blurry and out of focus anyway, and so again you have to stand farther back so that something in the image looks sharp.
Sometimes you find someone who wants sharpness all the way down to the pixel level, and they want an extremely large depth of field; they may choose an excessively tiny circle of confusion, but will end up with so much diffraction that it becomes quite noticeable. Zooming way into an image may not be pleasant, and so they end up stepping back or zooming out from their image, ironically increasing their depth of field, which is what they wanted to do in the first place, but not in the manner they desired.