How fast does your computer process with SPP 6?

I bet a lot the so called matrix math Foveon needs to do would dramatically accelerate with GPU use. I realize GPU support is not trivial given the different platforms and configurations but it would sure speed it up if done well.
Really SPP only has to worry about two platforms, Mac and Windows. On the Mac side I'm pretty sure just replacing a lot of stdlib calls to math functions with the Mac accelerated versions (which use the GPU as appropriate) would give it quite a boost, never mind really re-working the entire conversion process to fully work with the GPU.

Not as sure what libraries Windows has that automatically move matrix calculations to the GPU...
Direct 3d libraries support this. The work for GPU is not that trivial to do right. I don't you can do th std lib on gpu you are mentioning.
If you use a number of libraries that make use of the GPU, it's pretty easy to integrate. GPUImage is on such example for the Mac, which is directly tailored to images.


Then there's Appel's own lower level Accelerate framework, which is lower level but can speed up basic operations like matrix multiplication or transformation.

Most (about half) of the processing time for Quattro images appears to be in a form of bilateral filter, if you run time profiling tools against it. The GPU is flatlined, no activity, while all cores are maxxed out.. the GPU would help a ton there.
--
---> Kendall
 
I bet a lot the so called matrix math Foveon needs to do would dramatically accelerate with GPU use. I realize GPU support is not trivial given the different platforms and configurations but it would sure speed it up if done well.
Really SPP only has to worry about two platforms, Mac and Windows. On the Mac side I'm pretty sure just replacing a lot of stdlib calls to math functions with the Mac accelerated versions (which use the GPU as appropriate) would give it quite a boost, never mind really re-working the entire conversion process to fully work with the GPU.

Not as sure what libraries Windows has that automatically move matrix calculations to the GPU...
Direct 3d libraries support this. The work for GPU is not that trivial to do right. I don't you can do th std lib on gpu you are mentioning.
If you use a number of libraries that make use of the GPU, it's pretty easy to integrate. GPUImage is on such example for the Mac, which is directly tailored to images.

http://nshipster.com/gpuimage/
Sorry, I misinterpreted what you said. I thought you were talking for a moment about an stdlib or such port to GPU which I believe is no trivial task.
Then there's Appel's own lower level Accelerate framework, which is lower level but can speed up basic operations like matrix multiplication or transformation.
Yes, that's the equivalent of Direct 3d's Direct Compute.
Most (about half) of the processing time for Quattro images appears to be in a form of bilateral filter, if you run time profiling tools against it. The GPU is flatlined, no activity, while all cores are maxxed out.. the GPU would help a ton there.
Yes, which is why I brought it up in the first place. The GPU is simply not used for the RAW conversion.
 
so if both cpu and drive speed is not the culprit for the bottleneck, I can only think of support for SPP.
CPU speed is the culprit in a certain sense. The calculations in SPP are not as trivial as they seem to be. Maybe it's possible to improve by utilizing the graphic card but at the moment Sigma doesn't go this way.
 
I bet a lot the so called matrix math Foveon needs to do would dramatically accelerate with GPU use. I realize GPU support is not trivial given the different platforms and configurations but it would sure speed it up if done well.
Really SPP only has to worry about two platforms, Mac and Windows. On the Mac side I'm pretty sure just replacing a lot of stdlib calls to math functions with the Mac accelerated versions (which use the GPU as appropriate) would give it quite a boost, never mind really re-working the entire conversion process to fully work with the GPU.

Not as sure what libraries Windows has that automatically move matrix calculations to the GPU...
Direct 3d libraries support this. The work for GPU is not that trivial to do right. I don't you can do th std lib on gpu you are mentioning.
If you use a number of libraries that make use of the GPU, it's pretty easy to integrate. GPUImage is on such example for the Mac, which is directly tailored to images.

http://nshipster.com/gpuimage/

Then there's Appel's own lower level Accelerate framework, which is lower level but can speed up basic operations like matrix multiplication or transformation.

Most (about half) of the processing time for Quattro images appears to be in a form of bilateral filter, if you run time profiling tools against it. The GPU is flatlined, no activity, while all cores are maxxed out.. the GPU would help a ton there.

--

---> Kendall
http://www.flickr.com/photos/kigiphoto/
http://www.pbase.com/kgelner
http://www.pbase.com/sigmadslr/user_home
Nice! I would think Sigma should read that . . . and this page:

 
FYI converting an X3F Merrill to 16-bit *.tif took 12 seconds under Windows 10 with SPP 6.06 (as opposed to 16 seconds under Windows Server 2012 R2 Datacenter)
e5651ec00065461fab7c3ba545f060b4


resource monitor i7-4770S mSATA SSD

Chris
 

Attachments

  • e5651ec00065461fab7c3ba545f060b4.jpg.png
    e5651ec00065461fab7c3ba545f060b4.jpg.png
    1.1 MB · Views: 0
cquarksnow said:
FYI converting an X3F Merrill to 16-bit *.tif took 12 seconds under Windows 10 with SPP 6.06 (as opposed to 16 seconds under Windows Server 2012 R2 Datacenter)


resource monitor i7-4770S mSATA SSD

Chris
I wasn't aware that windows 10 was available to anyone yet.
 
FYI converting an X3F Merrill to 16-bit *.tif took 12 seconds under Windows 10 with SPP 6.06 (as opposed to 16 seconds under Windows Server 2012 R2 Datacenter)

View attachment 629910
resource monitor i7-4770S mSATA SSD

Chris
Thank you for that info. Chris! It's amazing how much difference an OS can make! I checked out your CPU, and there looks to be a lot of room for improvement, with the new processors that are coming out. Your 4770S, while pretty fast, is nowhere near as fast as the newer X processors (that's what I call them anyway).

This is your processor, I believe, which costs about $300 today:


This is a fast new 5,000 series X processor, which costs about $1,000:


Of course, the computer would no doubt cost about twice as much for one with that octo-core processor vs. the quad-core processor, so that level of performance will be only for those who are willing to dish out an extra $1,000 or so, I guess.

One thing I don't understand is the expense of computers with the i7-5820K processor. It's not much more expensive, but I can't find computers with that thing in them for less than about $2,000. I guess that will change quickly though, as more motherboards become available.



It looks like these new machines, with more cache and faster, DDR4 RAM, will really scream. My guess is that by Christmas, or maybe some time in the Spring we will see some sick new computers running Windows 10, which will be able to export Quattro images to JPEG or 16 bit TIFF in less than 10 seconds, for $1,500 or less . . . or do you think Windows 10 will stay in beta for a long time?

What do you think Chris?

I just read this, but it gives no indication about beta timing:

 
Hi, Scott -

I got this i7-4770s in June last year, as it had integrated 4k graphics, so I could feed my 2160p screen with a tiny desktop. With a µATX motherboard and 16GB DDR3 I spent about $700 for hardware. If I could have waited for the Broadwell 14nm shrink with DDR4 support on an µATX board like the ASRock FATAL1TY X99M, it would have been probably north of $1100 given DDR4 pricing, yet having 15MB L3 instead of 8MB might be appreciable. In my case, I'll wait for the Skylake redesign, hoping integrated graphics will do 4320P, as more than likely 8k screens will become more affordable by then, hence we'll be able to view a Merrill or Quattro picture.
Maybe the corresponding Xeon iterations will have more cores and handle SPP faster, yet by then some sections of SPP might be rewritten for the Xeon Phi or other GPUs

As far as memory speed, Cisco bought Nuovo so as to get the patents they used in some of the UCS blades to quadruple (speed and capacity) ECC DIMMS per socket, and a few years back they had 2TB on a dual socket motherboard.

Also last year, Sandisk grabbed Diablo of Canada who realized everyone's dream to put NAND on memory chips with about 5µs write latency (unlike these miserable PCIe SSD drives like fusion-io)http://smartstoragesys.com/products/ulltradimm.asp
So they have 240GB or 400GB with high provisioning on an ECC DIMM, so you have the DDR3 bandwidth. As it's scalable, two sockets will be twice faster, so on a motherboard with 96 DIMM sockets, we are talking about a theoretical 73GB/s that would actually take 9 minutes to write the 38TB.
The would be the hardware limitation if SPP was able to churn Quattro x3f to 16-bit S-HI *.tif (~236MB) in 0.002 seconds each, taking 9 minutes for the 190000 pictures that would fill up the UlltraDIMMs.
Since SPP might do probably better on a Xeon Phi, that 73GB/s would be the bottleneck.
IBM was the first licensee of Sandisk last year so the have Intel servers demonstrating the concept of SAP HANA, and after the demo, some of the IBM eXflash (their name for Not Invented Here UlltraDIMMS) were for sale second hand :
http://www.memory4less.com/m4l_item...Iyw4PJYttWpurqeY2dIm9nEzwZD07HGECwaAmnA8P8HAQ

Beware that you need substantial UEFI/BIOS mods to handle the UlltraDIMMS ASICs are they have to present themselves as block devices.

Meanwhile I have tried the fastest RAM disks writing 10GB/s in Windows with very low latency, so the 236MB *.tif would be written in 0.02 s rather than about a second on a PCIe SSD, so unless SPP takes 3 seconds to export a 16-bit s-hi *.tif, it's really the number of cores and the L3 cache that will help. That's why I send the graphs, so you could imagine that with 15MB of L3, the i7-5* you mentioned would not be twiddling its thumbs, and a 16-core version might allow SPP 6 to spit a Merrill X3F to 16-bit tiff in about 7 seconds.

Chris
 
Last edited:
400 GB of RAM per module? lol Nice!

Interesting stuff Chris, but I think I'll stick with the more conventional systems. It will be interesting to watch how fast SPP gets over the next year. Hopefully I'll have the money to buy a Quattro DSLR and then a new computer to handle the files by then.
 
You might in fact make the jump to Skylake (6th Gen core) if you can wait for the end of 2015, as it will be a new design. Not to say that the Broadwell that shrinks the 4th generation to 14nm does not have advantages besides power consumption, such as DDR4 & bigger L3 cache, but if you can deal with SPP taking 12 seconds to export a Merrill X3F to 16bpc tiff, you might be better off. The errata on new designs are not as scary as 20 years ago.

Chris
 
O.K, from this thread so far, it looks like I needed to make a template for people, so we can get a more clear idea of system speed and results, with a more consistent method of comparison. So I decided to make a list of things that would be great to know. Please use it, if you can. I realize it is a lot of information, but it could have been a lot more extensive. (Please remove the i.e lines - the lines with parentheses.) Here:

My computer is/has:

?.? GHz ?-core ? processor made by ?
(i.e. 2.5 GHz quad-core processor made by Intel or 3.0 GHz processor made by AMD)
My processor is a ?
(i.e. Intel i7 4700HQ or AMD A8-6410)
My processor was made in 201?
? GB of DDR? RAM running at ? MHz
(i.e. 16 GB of DDR3 RAM running at 1600 MHz)
? GB ? Hard Drive
(i.e. 512 GB SSD Hard Drive or 3000 GB 7200 RPM Hard Drive)
? Video Card with ? GB of ? video memory
(i.e. NVIDIA GeForce GT 650M Video Card with 2 GB of DDR5 video memory)
? OS version ?.?.?
(i.e. Mac OS version 10.7.5 or Windows OS version 8.1)

SPP does the following (with no other major applications/processes running):

? seconds to start Sigma Photo Pro
? seconds to open a Quattro raw file
? seconds to open a Merrill raw file
? seconds to view Quattro raw file at 100% size
? seconds to view Merrill raw file at 100% size
? seconds to save Quattro raw file as JPG with best quality (level 12)
? seconds to save Merrill raw file as JPG with best quality (level 12)
? seconds to adjust white balance preset when viewing Quattro raw file at 100%
? seconds to adjust white balance preset when viewing Merrill raw file at 100%
? seconds to change exposure slider and see result with Quattro raw file at 100%
? seconds to change exposure slider and see result with Merrill raw file at 100%
? seconds to save adjusted Quattro raw file as JPG with best quality (level 12)
? seconds to save adjusted Merrill raw file as JPG with best quality (level 12)
? seconds to save adjusted Quattro raw file as 16 bit TIFF
? seconds to save adjusted Merrill raw file as 16 bit TIFF
? seconds to save 10 raw Quattro files as JPG with best quality (level 12)
? seconds to save 10 raw Merrill files as JPG with best quality (level 12)
? seconds to save 10 raw Quattro files as 16 bit TIFF
? seconds to save 10 raw Merrill files as 16 bit TIFF
 
If that can help for the first item, SPP 6 produces a log, that for instance in Windows is located by default in %appdata%\Local\SIGMA\SIGMA_PhotoPro6\Log

Below is a sample of it starting on an Intel i7-4770s (max 3.10GHz) with 16GB of DDR3 running at 1600MHz, where it takes less than half a second :

22:55:56:526[App] ThreadID:1 : : Line_0 : Main_Main() start
22:55:56:537[App] ThreadID:1 : : Line_0 : .ctor_MainWindow Constructor
22:55:56:541[App] ThreadID:1 : : Line_0 : initWindow_MainWindow initWindow
22:55:56:559[App] ThreadID:1 : : Line_0 : .ctor_MainProcess constructor
22:55:56:566[App] ThreadID:1 : : Line_0 : .ctor_complete to create FileIOManager
22:55:56:573[App] ThreadID:1 : : Line_0 : .ctor_complete to create ClassVersionCheckManager
22:55:56:575[App] ThreadID:1 : : Line_0 : .ctor_complete to create ClassParameterManager
22:55:56:577[App] ThreadID:1 : : Line_0 : .ctor_complete to create ClassImageProcessingManager
22:55:56:582[App] ThreadID:1 : : Line_0 : initializeProcess_start initializeProcess()
22:55:56:585[App] ThreadID:1 : : Line_0 : initialize_start PreferenceManager.initialize
22:55:56:621[App] ThreadID:1 : : Line_0 : initializeProcess_---startup dir---:Desktop\This PC\Documents\Magic Briefcase\Win10\input transducers\formats\Sigma\hybrid
22:55:56:623[App] ThreadID:1 : : Line_0 : initialize_start initialize
22:55:56:638[App] ThreadID:1 : : Line_0 : .ctor_complete initializeProcess()
22:55:56:641[App] ThreadID:1 : : Line_0 : initMainProcess_try MainProcess.getInstance()
22:55:56:669[App] ThreadID:1 : : Line_0 : initWindow_complete initMainProcess()
22:55:56:672[App] ThreadID:1 : : Line_0 : initWindow_SplashScreen.ShowSplash()
22:55:56:677[App] ThreadID:1 : : Line_0 : initMainWindow_start initMainWindow()
22:55:56:874[App] ThreadID:1 : : Line_0 : initWindow_complete initMainWindow()
22:55:56:884[App] ThreadID:1 : : Line_0 : initToolStrip_start initToolStrip()
22:55:56:933[App] ThreadID:1 : : Line_0 : setMainWindowPreferenceParam_start setMainWindowPreferenceParam()
22:55:56:941[App] ThreadID:1 : : Line_0 : initListView_start initListView()
22:55:56:968[App] ThreadID:1 : : Line_0 : initializeContextMenuStripEventHandlerListView_start initializeContextMenuStripEventHandlerListView()
22:55:56:971[App] ThreadID:1 : : Line_0 : initializeContextMenuStripEventHandlerTreeView_start initializeContextMenuStripEventHandlerTreeView()
22:55:56:973[App] ThreadID:1 : : Line_0 : GetInstance_complete initWindow()
22:55:56:975[App] ThreadID:1 : : Line_0 : Main_run application

(I stumbled upon this while swapping raw data between X3F, but not updating the last 76 bytes after the raw image - final SECd - and looking at the error log in the same directory :

APP Version:6.0.6.2142
ProcessVersion:2.1.1.0
ProcessVersion:3.0.1.0
ProcessVersion:ffff.0.0.0
ProcessVersion:fffe.0.0.0
Date:20141006_225246
Process Memory:1302MB
Error code:C-14
StackTrace:
1: MainProcess::logOut
2: MainProcess::getThumbnailImage
3: CtmListViewCacheThumbnail::bw_DoWork
4: QueuedBackgroundWorker::Run
5: ExecutionContext::RunInternal
6: ExecutionContext::Run
7: ExecutionContext::Run
8: ThreadHelper::ThreadStart

Chris
 
Last edited:
Thanks for the info. Chris. "APP Version:6.0.6.2142" - that's cool. That's a LOT of versions!

;)
 

Keyboard shortcuts

Back
Top