What does CMOS have to do with this?
Nothing really, other than the fact that it
can occur in CMOS imaging chips - it doesn't have to though.
The effect is actually a mismatch of imaging and display technologies, and is not really a problem with one or the other, it is the combination of both.
30 years ago, when I first started in the sensor design business this same effect was actually a criticism of early CCD cameras. How come?
Well, the predominant display technology of the time (in fact almost the only image display technology) was the CRT and that had a sort of "rolling shutter" display. The image was written on the CRT in a raster fashion from top to bottom. This hadn't been a problem in the previous 40 years of television because until then all earlier cameras had also been based on CRTs: the vidicon and its many derivatives with the image being read off the faceplate in the same raster fashion. The image built up on the faceplate between each read by the electron beam, so this was a rolling shutter with a shutter speed exactly the same as the frame rate.
Using both of these CRT based image and display technologies the time delay between the top of the frame being captured and displayed was exactly the same as the time delay at the bottom of the frame, so everything "looked" right.
Of course, if you recorded the video and played it back in slow-mo or freeze framed then you could see the time delay in the vidicon tube between the top of the image being read and the bottom of the image being read - so panning across verticals caused them to tilt backwards in still frames: the top of the image was recorded before the bottom of the image, which had panned a little further in the delay between the two. However, this was perfectly compensated on the CRT display during normal playback so that the tilt on each vertical appeared upright again due to the delay as the CRT wrote the image on the screen.
Then, along came CCDs in several different varieties: Interline transfer, frame transfer etc. One thing that was common to all of the CCD designs was that the image was captured on all of the frame simultaneously. So, when panning across a vertical edge, that edge stayed upright on every single still frame. However, when displayed on the raster updated CRT display, the top of the image was written an entire field period before the bottom of the image. The result was that continuous video of pans appeared to tilt vertical lines forwards.
Initial CMOS image developments for video sensors attempted to correct this problem by reading the image in the same sequence as the old vidicon tubes, sequentially with a rolling shutter scanning from top to bottom. That worked well when the images were shown in standard time on displays which also updated in raster fashion, and most video displays were CRTs so the "problem" went away for a while. As with the old vidicon images, still frames from mid-pan shots still showed the same tilt, but the CRT display fixed that when the movie ran in normal time.
Nowadays, many flat screen displays such as LCDs and plasma, update the entire image simultaneously, with the data first being loaded into a frame store. That is why the rolling shutter CMOS design, which was originally implemented to correct the same deficiency in CCDs, causes this same visual effect - in fact, it is the opposite effect and the tilt is reversed. And the effect gets worse as the frame or field rate is reduced - so you see less leaning verticals with 60i than with 30p.
So, when you combine the "problem" of the CCD with the "problem" of the flat panel display you get the best of both worlds - vertical still frames that stay vertical when panned in moving footage.
There is no reason why CMOS sensors cannot be designed to have a similar "snapshot" mode of operation as CCDs and, indeed, many do. Some even support both modes. They can do this while the exposure time is a small fraction of the frame time - all it needs is the exposure control line for each row taken out to a shift register at the edge of the chip rather than globally connected.
However, to operate in low light both sensor types need to capture the image for as large a proportion of the frame time as possible. Neither sensor type can image while it is being read out. With CCDs, this was resolved using interline or frame transfer structures - blind areas of the sensor that the image data was quickly transferred to before being read out while the next frame was being captured in the active area. CCDs without these storage areas are suitable for still frame cameras but not for video. With interline transfer CCDs, the blind areas are in between each active column - this limits the resolution achievable in that axis. With frame transfer CCDs, the blind area is an adjacent area next to and the same size as the active area - this doubles the size of the chip. To implement snapshot operation and achieve low light sensitivity, the CMOS would have to follow similar architectures to video CCDs, which means additional circuitry and restrictions.
So it isn't so much a problem of CMOS. Its more a problem that has gone away with CCDs as display technology has evolved.
--
Its RKM