Are high-end camera processors better than pc processors?

Satyaa

Veteran Member
Messages
7,059
Solutions
7
Reaction score
2,435
Location
MA, US
Seeing how many images from stacked sensors are being processed per second in modern cameras, I am guessing that these processors are more powerful for graphics processing than the mainstream PC processors. Is that accurate?

The buffer is generally a limitation otherwise the stacked sensor ML cameras are capturing, processing and saving RAW + JPEG at an insane speed. Sony's A1, latest Fuji X-H2S, Canon's R3, Nikon's Z9, Olympus OM-1, etc., come to mind.

They do run camera's OS (whatever that is), support Wi-Fi, HDMI, Ethernet, Bluetooth, external recording and a variety of video qualities.

If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.

Am I missing something?

Thanks

--
See my profile (About me) for gear and my posting policy.
 
Last edited:
Cameras have incredibly slow processors. Standard desktops have been able to process hundreds of images per second for a long time.
 
Last edited:
Seeing how many images from stacked sensors are being processed per second in modern cameras, I am guessing that these processors are more powerful for graphics processing than the mainstream PC processors. Is that accurate?

The buffer is generally a limitation otherwise the stacked sensor ML cameras are capturing, processing and saving RAW + JPEG at an insane speed. Sony's A1, latest Fuji X-H2S, Canon's R3, Nikon's Z9, Olympus OM-1, etc., come to mind.

They do run camera's OS (whatever that is), support Wi-Fi, HDMI, Ethernet, Bluetooth, external recording and a variety of video qualities.

If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.

Am I missing something?

Thanks
It is hard to make an apples-to-apples comparision. Cameras and smartphones tends to contain a System-on-a-chip that includes generic compute cores (e.g. ARM) and specialized hardware (image processing processor, machine learning accelerator). For tasks that are known at the time of silicon manufacture, these specialized compute devices can be highly efficient (in terms of battery, heat, area and unit cost).

A PC tends to have 4-16 fairly generic and fast cores, and less specialized components, Granted, a dedicated GPU may do semi-specialized tasks at great speed, but PCs typically don't include ISPs and up until recently not a ML accelerator.

If you need just the kind of processing that your camera/smartphones does today, its specialized hardware could be able to do that with less battery drain, less heat and (in some ways) less expensive than a generic computer doing the exact same task. But if you want to evolve that algorithm over the years, or do more complex work, the PC may scale somewhat "proportionately" (i.e. an algorithm that contains twice as many multiplies may be on the order of half as fast), while doing the same complexity increase in your camera may be practically impossible.
 
Seeing how many images from stacked sensors are being processed per second in modern cameras, I am guessing that these processors are more powerful for graphics processing than the mainstream PC processors. Is that accurate?

The buffer is generally a limitation otherwise the stacked sensor ML cameras are capturing, processing and saving RAW + JPEG at an insane speed. Sony's A1, latest Fuji X-H2S, Canon's R3, Nikon's Z9, Olympus OM-1, etc., come to mind.

They do run camera's OS (whatever that is), support Wi-Fi, HDMI, Ethernet, Bluetooth, external recording and a variety of video qualities.

If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.

Am I missing something?

Thanks
It is hard to make an apples-to-apples comparision. Cameras and smartphones tends to contain a System-on-a-chip that includes generic compute cores (e.g. ARM) and specialized hardware (image processing processor, machine learning accelerator). For tasks that are known at the time of silicon manufacture, these specialized compute devices can be highly efficient (in terms of battery, heat, area and unit cost).

A PC tends to have 4-16 fairly generic and fast cores, and less specialized components, Granted, a dedicated GPU may do semi-specialized tasks at great speed, but PCs typically don't include ISPs and up until recently not a ML accelerator.

If you need just the kind of processing that your camera/smartphones does today, its specialized hardware could be able to do that with less battery drain, less heat and (in some ways) less expensive than a generic computer doing the exact same task. But if you want to evolve that algorithm over the years, or do more complex work, the PC may scale somewhat "proportionately" (i.e. an algorithm that contains twice as many multiplies may be on the order of half as fast), while doing the same complexity increase in your camera may be practically impossible.
Yup. The big limitation here is storage bandwidth - Feeding the entire stack of images from a burst requires a lot of bandwidth over a high-bandwidth interface.

You just don't see these kinds of interfaces going from one device to another.

For example, Jim Kasson recorded a readout for some of the earlier A7R devices (R2 or R3, I can't remember what) that was consistent with over 1 GPixel/second at 12+ bits/pixel (I don't remember the exact number). 12 GBits/second which is higher than many USB interfaces can hit, and that was from the aging BIONZ X on cameras that only had a USB 2.0 interface externally. Note that the moment you hit the scaling/demosaicing engine you hit a 500-600 MPixel/sec bottleneck in the BIONZ X, and then even narrower bottlenecks by the time you reached storage. Newer cameras have been clocked at much higher transfer rates.

If you could get that raw sensor bandwidth straight to a PC, you'd have some interesting options open - but that's the realm of very expensive dedicated industrial cameras that have the sensor interfacing to a minimal bridge chip and then high-lane-count PCI-Express.

--
Context is key. If I have quoted someone else's post when replying, please do not reply to something I say without reading text that I have quoted, and understanding the reason the quote function exists.
 
Last edited:
Seeing how many images from stacked sensors are being processed per second in modern cameras, I am guessing that these processors are more powerful for graphics processing than the mainstream PC processors. Is that accurate?
hjulenissen addressed this quite well so no need to expand on it
The buffer is generally a limitation otherwise the stacked sensor ML cameras are capturing, processing and saving RAW + JPEG at an insane speed. Sony's A1, latest Fuji X-H2S, Canon's R3, Nikon's Z9, Olympus OM-1, etc., come to mind.
Keep in mind "buffer" can be limited by either DRAM capacity / flash speed or by the imaging processors themselves. For example, the original Nikon Z bodies were heavily limited by their single Expeed processor, which was rectified by adding a second processor. Here's a deep-dive into how Expeed limited the buffer on the original Z bodies:

https://www.dpreview.com/forums/post/63645424
 
Last edited:
If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.
Much depends on how the GPU is used. Even without tight optimization for the GPU onboard, FastRawViewer processes 10 to 20 Nikon Z 9 raw images per second on a fairly recent MacBook Pro, and that's with raw histogram calculation and 4K display of converted images.

--
http://www.libraw.org/
 
Last edited:
Wow. That's impressive.

Apple's M1, M2 systems have definitely improved this a lot.

Intel/Windows machines probably require dedicated GPUs.
 
Wow. That's impressive.

Apple's M1, M2 systems have definitely improved this a lot.
It's fortunate Iliah saw this thread because he's the best person to ask on this subject, as he has firsthand experience on the performance differences between general purpose CPUs and dedicated logic for image processing like a GPU.
Intel/Windows machines probably require dedicated GPUs.
The line between integrated and dedicated GPUs is becoming blurry. Technically speaking, integrated GPUs can be considered dedicated to varying degrees. Apple certainly has the performance lead, owing to the tight integration of their SoC GPU cores and memory architecture.
 
Last edited:
If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.
Much depends on how the GPU is used. Even without tight optimization for the GPU onboard, FastRawViewer processes 10 to 20 Nikon Z 9 raw images per second on a fairly recent MacBook Pro, and that's with raw histogram calculation and 4K display of converted images.
Hi Iliah, if you could share, how much of the imaging pipeline were you able to offload onto the GPU inside FRV? How difficult was the effort for each of the major elements?
 
Seeing how many images from stacked sensors are being processed per second in modern cameras, I am guessing that these processors are more powerful for graphics processing than the mainstream PC processors. Is that accurate?

The buffer is generally a limitation otherwise the stacked sensor ML cameras are capturing, processing and saving RAW + JPEG at an insane speed. Sony's A1, latest Fuji X-H2S, Canon's R3, Nikon's Z9, Olympus OM-1, etc., come to mind.

They do run camera's OS (whatever that is), support Wi-Fi, HDMI, Ethernet, Bluetooth, external recording and a variety of video qualities.

If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.

Am I missing something?

Thanks
Asics and PC processors are similar but not the same thing.

Asics do one job and one job well, PC ones are designed to be general use.
Keep in mind size and heat limitations in an enclosed, task specific tool like a camera.

Of course a PC could process images like a camera does but at the size of a camera?
 
how much of the imaging pipeline were you able to offload onto the GPU inside FRV?
About 90% "by volume". More is possible, but the more one offloads the more the app becomes sensitive to bugs in video drivers (which are countless and in many cases are introduced with updates).
How difficult was the effort for each of the major elements?
That varies, some elements took years to offload.

--
http://www.libraw.org/
 
Last edited:
Hi Iliah, if you could share, how much of the imaging pipeline were you able to offload onto the GPU inside FRV? How difficult was the effort for each of the major elements?
We have been told for a decade or more that GPUs offer tremendous speedup over cpus.

My (granted limited) experience has been that for _some_ tasks, GPUs are vastly more efficient than cpus. If you want to perform a floatingpoint matrix multiply of large matrixes, gigantic FFTs or other "numerical compute, HPC/scientific compute single-precision float task that could be somewhat trivially parallellized", then GPUs are the obvious choice. Further, if you do such standard operations, you could probably use some pre-built library that is hand-tuned for a particular GPU by someone else.

However, if your code is less inherently parallell, if it contains integer math (or double-precision float), branching, "bit fiddling" and stuff like that, GPUs may not always be such an obvious choice.

Any software (commercial or open) needs to trade developer effort/willingness for functionality and speed for one or many users. Given an optimization expert and possibly an algorithm/domain expert, it has been my experience that most code can be made to run 2x to 10x faster on a given set of hardware with no or minimal reduction in quality. Problem is, if that costs 3 months or 12 months of full-time effort, then it might not be worth it. If the number of affected users is small, they could just buy new hardware and be done with it. If the number of users is large, there is still the question of if this speedup is worth enough to them.

-h
 
Last edited:
Yup. The big limitation here is storage bandwidth - Feeding the entire stack of images from a burst requires a lot of bandwidth over a high-bandwidth interface.

You just don't see these kinds of interfaces going from one device to another.

For example, Jim Kasson recorded a readout for some of the earlier A7R devices (R2 or R3, I can't remember what) that was consistent with over 1 GPixel/second at 12+ bits/pixel (I don't remember the exact number). 12 GBits/second which is higher than many USB interfaces can hit, and that was from the aging BIONZ X on cameras that only had a USB 2.0 interface externally. Note that the moment you hit the scaling/demosaicing engine you hit a 500-600 MPixel/sec bottleneck in the BIONZ X, and then even narrower bottlenecks by the time you reached storage. Newer cameras have been clocked at much higher transfer rates.

If you could get that raw sensor bandwidth straight to a PC, you'd have some interesting options open - but that's the realm of very expensive dedicated industrial cameras that have the sensor interfacing to a minimal bridge chip and then high-lane-count PCI-Express.
A general observation is that asic/fpga processing tends to be highly pipelined. Also if you have a semi-custom SoC wired to an image sensor and a memory card writer (and viewfinder), you would probably try to set thing up so that burst of images are streamed across the SoC exactly once, and a tight working set memory footprint is kept during processing. This allows you to maximize usage of physical bandwidth, but may limit algorithmic possibilities (want to compare pixel#2 of frame #1 to pixel#3445566 of frame#10? tough luck).

Software written for a PC may benefit from some of the same techniques, but there you might have 10s or even 100s of MB of relatively fast cache, and a really clever system working behind the scenes to make it appear like all memory is running at cache speed. So you get to use memory more liberally, and re-access the same datapoint several times if that is useful for the problem you want to solve.
 
If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.
Much depends on how the GPU is used. Even without tight optimization for the GPU onboard, FastRawViewer processes 10 to 20 Nikon Z 9 raw images per second on a fairly recent MacBook Pro, and that's with raw histogram calculation and 4K display of converted images.
And at this point, the PCI Express bandwidth comes into play - the more you can do on-GPU without transferring between GPU and CPU, the better. CPU-GPU transfers are quite time consuming.
 
If a minimalist Linux machine was built on a fairly recent laptop, and the fastest available image processing application was run, I don't think it will process anything close to 20 images per second.
Much depends on how the GPU is used. Even without tight optimization for the GPU onboard, FastRawViewer processes 10 to 20 Nikon Z 9 raw images per second on a fairly recent MacBook Pro, and that's with raw histogram calculation and 4K display of converted images.
And at this point, the PCI Express bandwidth comes into play - the more you can do on-GPU without transferring between GPU and CPU, the better. CPU-GPU transfers are quite time consuming.
True. Going back and forth between CPUs and GPUs isn't really a good option, at least at the present state of things.
 
Any software (commercial or open) needs to trade developer effort/willingness for functionality and speed for one or many users. Given an optimization expert and possibly an algorithm/domain expert, it has been my experience that most code can be made to run 2x to 10x faster on a given set of hardware with no or minimal reduction in quality. Problem is, if that costs 3 months or 12 months of full-time effort, then it might not be worth it. If the number of affected users is small, they could just buy new hardware and be done with it. If the number of users is large, there is still the question of if this speedup is worth enough to them.

-h
It's interesting to push the envelope, and it's useful to look at programming (especially, when it is for art) as at an art in itself. What's 3 months or 3 years if one is trying to create art?

Also, programming skills and perseverance come into question, more and more.
 
Hi Iliah, if you could share, how much of the imaging pipeline were you able to offload onto the GPU inside FRV? How difficult was the effort for each of the major elements?
We have been told for a decade or more that GPUs offer tremendous speedup over cpus.

My (granted limited) experience has been that for _some_ tasks, GPUs are vastly more efficient than cpus. If you want to perform a floatingpoint matrix multiply of large matrixes, gigantic FFTs or other "numerical compute, HPC/scientific compute single-precision float task that could be somewhat trivially parallellized", then GPUs are the obvious choice. Further, if you do such standard operations, you could probably use some pre-built library that is hand-tuned for a particular GPU by someone else.

However, if your code is less inherently parallell, if it contains integer math (or double-precision float), branching, "bit fiddling" and stuff like that, GPUs may not always be such an obvious choice.

Any software (commercial or open) needs to trade developer effort/willingness for functionality and speed for one or many users. Given an optimization expert and possibly an algorithm/domain expert, it has been my experience that most code can be made to run 2x to 10x faster on a given set of hardware with no or minimal reduction in quality. Problem is, if that costs 3 months or 12 months of full-time effort, then it might not be worth it. If the number of affected users is small, they could just buy new hardware and be done with it. If the number of users is large, there is still the question of if this speedup is worth enough to them.

-h
Agreed. It applies more generally to parallelizing algorithms and not specific to GPUs (ie, multi-threading and OpenMP). It takes a lot of expertise, effort, and time, which most companies think is better spent on coding new features.
 

Keyboard shortcuts

Back
Top