Hi Iliah, if you could share, how much of the imaging pipeline were you able to offload onto the GPU inside FRV? How difficult was the effort for each of the major elements?
We have been told for a decade or more that GPUs offer tremendous speedup over cpus.
My (granted limited) experience has been that for _some_ tasks, GPUs are vastly more efficient than cpus. If you want to perform a floatingpoint matrix multiply of large matrixes, gigantic FFTs or other "numerical compute, HPC/scientific compute single-precision float task that could be somewhat trivially parallellized", then GPUs are the obvious choice. Further, if you do such standard operations, you could probably use some pre-built library that is hand-tuned for a particular GPU by someone else.
However, if your code is less inherently parallell, if it contains integer math (or double-precision float), branching, "bit fiddling" and stuff like that, GPUs may not always be such an obvious choice.
Any software (commercial or open) needs to trade developer effort/willingness for functionality and speed for one or many users. Given an optimization expert and possibly an algorithm/domain expert, it has been my experience that most code can be made to run 2x to 10x faster on a given set of hardware with no or minimal reduction in quality. Problem is, if that costs 3 months or 12 months of full-time effort, then it might not be worth it. If the number of affected users is small, they could just buy new hardware and be done with it. If the number of users is large, there is still the question of if this speedup is worth enough to them.
-h