Deep dive into Z XQD vs CFE performance differences

Horshack

Forum Pro
Messages
11,231
Solutions
28
Reaction score
12,594
Location
US
I just received a Sony 128GB CFE card. I was curious about the slight decrease in burst performance vs XQD others have reported and wanted to take a look and better characterize the performance differences. Using my Z6 I shot a 20-second burst in Continuous High+ with the lens cap on, releasing the shutter button after 20-seconds and then analyzing all the resulting frames stored on the card [note I'm allowing the camera to take as long as it wants to finish flushing images at the end of the burst, which has implications for comparing the total images shot - I was more interested in characterizing the buffer flow and performance in the middle of the burst]. The XQD card used was a Sony 64GB 400 MB/s XQD 2.0.

Using only raw I compared uncompressed, lossless compressed, and lossy compressed. Here are some cursory results:
  • Uncompressed raw: XQD = 105 images, CFE = 93 images
  • Lossless raw: XQD = 121 images, CFE = 119 images
  • Lossy raw: XQD = 123, CFE = 119 images
So things look about the same except for uncompressed, where XQD shoots about 14% more raws. In fact I'm able to catch outliers where CFE's performance drops to 82 images, which I'm investigating separately.

To better characterize what's going on in the management of the frame buffer and media card I examined and charted out the sub-second EXIF values of each image of the burst. I used the uncompressed raw test since it shows the largest performance delta.

The sub-second values shows the time of each image in units of 1/100 of a second - for example, a sub-second value of "25" means 1/4 into the integral second. The most useful way to chart this is in delta sub-second values between each image. For example, if one image has a sub-second value of "39" and the next image has a sub-second value of "72", this means the delta time between these images is 1/33 of a second. Here is that data charted (direct link):

XQD vs CFE, 20-second burst, sub-second delta time between frames

XQD vs CFE, 20-second burst, sub-second delta time between frames

Some observations of this data:
  • CFE records 4 more images at full-speed before slowing down vs XQD (images #31-34)
  • The worst-case and average inter-shot latency between images is much higher for CFE than XQD. There are 8 shots with a latency near or above 0.60 seconds on CFE, whereas the worst-case latency on XQD is usually around 0.40 seconds.
I think with some more experiments and comparisons we might be able to tease out where the performance difference is happening at the media-card write interface. For example, is it simply a difference of sustained write speed? Or is it an abnormal/inconsistent write completion time?

This will likely be a slow-moving thread as I find time to work on this.
 
Last edited:
I just received a Sony 128GB CFE card. I was curious about the slight decrease in burst performance vs XQD others have reported and wanted to take a look and better characterize the performance differences. Using my Z6 I shot a 20-second burst in Continuous High+ with the lens cap on, releasing the shutter button after 20-seconds and then analyzing all the resulting frames stored on the card [note I'm allowing the camera to take as long as it wants to finish flushing images at the end of the burst, which has implications for comparing the total images shot - I was more interested in characterizing the buffer flow and performance in the middle of the burst]. The XQD card used was a Sony 64GB 400 MB/s XQD 2.0.

Using only raw I compared uncompressed, lossless compressed, and lossy compressed. Here are some cursory results:
  • Uncompressed raw: XQD = 105 images, CFE = 93 images
  • Lossless raw: XQD = 121 images, CFE = 119 images
  • Lossy raw: XQD = 123, CFE = 119 images
So things look about the same except for uncompressed, where XQD shoots about 14% more raws. In fact I'm able to catch outliers where CFE's performance drops to 82 images, which I'm investigating separately.

To better characterize what's going on in the management of the frame buffer and media card I examined and charted out the sub-second EXIF values of each image of the burst. I used the uncompressed raw test since it shows the largest performance delta.

The sub-second values shows the time of each image in units of 1/100 of a second - for example, a sub-second value of "25" means 1/4 into the integral second. The most useful way to chart this is in delta sub-second values between each image. For example, if one image has a sub-second value of "39" and the next image has a sub-second value of "72", this means the delta time between these images is 1/33 of a second. Here is that data charted (direct link):

XQD vs CFE, 20-second burst, sub-second delta time between frames

XQD vs CFE, 20-second burst, sub-second delta time between frames

Some observations of this data:
  • CFE records 4 more images at full-speed before slowing down vs XQD (images #31-34)
  • The worst-case and average inter-shot latency between images is much higher for CFE than XQD. There are 8 shots with a latency near or above 0.60 seconds on CFE, whereas the worst-case latency on XQD is usually around 0.40 seconds.
I think with some more experiments and comparisons we might be able to tease out where the performance difference is happening at the media-card write interface. For example, is it simply a difference of sustained write speed? Or is it an abnormal/inconsistent write completion time?

This will likely be a slow-moving thread as I find time to work on this.
Thanks, good stuff!

A quick question: How reliable do you think the EXIF timestamps are--eg. if they're calculated at exposure time and there's lag in the writes vs. if they're written at the time the file is completed writing to the card?

An alternate method might be recording audio of the shutter sounds and overlaying them, which would take out this unknown.
 
Thanks, good stuff!

A quick question: How reliable do you think the EXIF timestamps are--eg. if they're calculated at exposure time and there's lag in the writes vs. if they're written at the time the file is completed writing to the card?

An alternate method might be recording audio of the shutter sounds and overlaying them, which would take out this unknown.
It's not clear to me yet if the sub-second values are encoded when the image is taken or when the write of the image data has been submitted to the media card queue. In practice I think the difference would be muddied by the fact that the I/O submitted to the card might be delayed anyway (depending on position in queue). Nonetheless I plan to shoot the digital clock on my iPhone and compare the delta times between the visual clock times and the sub-second values to see if they match up.
 
Last edited:
Thank You.
 
Very interesting; thank you for doing this. It will be even more interesting to see the difference between timing methods. I suspect there's more margin for error with the EXIF sub-second timing, but we'll see. Most interesting would finding clues for the apparent periodic write slowdowns.

It might also be informative to compare the Sony CFE card with a 'non-complying' CFE card, e.g., SanDisk (assuming you could get your hands on one).
 
Last edited:
Reran the same comparison but used an iPhone clock as the photo subject instead of black frames. The purpose was to compare the sub-second values in the EXIF vs the real-time of captured images, to see if the gaps in the sub-second values corresponding to frame rate slowdowns is related to capture-time delays vs write-delays (ie, see if sub-second value corresponds to time of capture vs time of write). Based on this experiment it appears to be time-of-capture.

For this test I caught one of the intermittent cases where CFE slows down ever further - 81 frames in 20 seconds vs average of 93.

First the sub-second graph (direct link). Note the even greater disparity of when XQD starts slowing down vs CFE:

XQD vs CFE, 20-second burst, sub-second delta time between frames (iPhone clock subject)

XQD vs CFE, 20-second burst, sub-second delta time between frames (iPhone clock subject)

Animated GIF of XQD (direct link). Click "original size" to animate.

Animated GIF with XQD capture (100 images in 20 seconds)

Animated GIF with XQD capture (100 images in 20 seconds)

Animated GIF of CFE (direct link). Click "original size" to animate.

Animated GIF with CFE capture (81 images in 20 seconds)

Animated GIF with CFE capture (81 images in 20 seconds)
 
Last edited:
CFE intermittently beats XQD for lossless compressed on certain runs, whereas it always loses on uncompressed, which means the compressed differences are likely within the margin of error. It also means the CFE slowdown is likely I/O-size dependent, which I'll be further investigating by varying the shots a bit more (12-bit vs 14-bit, crop modes, with and without companion JPGs). For reference here are the average sizes of the blackframes:
  • 14-bit uncompressed: 44,896 KB
  • 14-bit lossless compressed: 25,425 KB
  • 14-bit lossy compressed: 21,453 KB
Possible causes of I/O-size performance dependencies include SLC cache differences between the cards, and/or NAND page management at various I/O sizes, and/or differences in the CFE queue logic of the BSD implementation Nikon is using. It would be helpful if I had a USB CFE reader, which would allow me to test varous workloads on the cards directly using a tool like Iometer.

Here's one of the lossless compressed runs, CFE = 125 images, XQD = 121 images (direct link):

XQD vs CFE, 20-second burst, sub-second delta time between frames (lossless compressed)

XQD vs CFE, 20-second burst, sub-second delta time between frames (lossless compressed)

And here is lossy compressed, where XQD usually beats CFE by a few frames, CFE = 120 imags, XQD = 126 images (direct link):

XQD vs CFE, 20-second burst, sub-second delta time between frames (lossy compressed)

XQD vs CFE, 20-second burst, sub-second delta time between frames (lossy compressed)
 
Last edited:
Thanks, good stuff!

A quick question: How reliable do you think the EXIF timestamps are--eg. if they're calculated at exposure time and there's lag in the writes vs. if they're written at the time the file is completed writing to the card?

An alternate method might be recording audio of the shutter sounds and overlaying them, which would take out this unknown.
It's not clear to me yet if the sub-second values are encoded when the image is taken or when the write of the image data has been submitted to the media card queue. In practice I think the difference would be muddied by the fact that the I/O submitted to the card might be delayed anyway (depending on position in queue). Nonetheless I plan to shoot the digital clock on my iPhone and compare the delta times between the visual clock times and the sub-second values to see if they match up.
If you have a device that can log voltages with high time resolution, the hot shoe with a pullup resistor would give some extremely clean data.
 
Card write speeds are such multi dimensional critters. In a data logger I built (in this case there were small data records of about 250 bytes that were captured at 10 Hz) the SD cards with higher specified rates had cyclical, comb-like patterns of occasions of 1-4 missed cycles. A lot of data went into the bit bucket while I was waiting for the write to complete. However the most modest card I was able to obtain had no gaps at all. My use case was very different than the manufacturer’s test case.

I wouldn’t have expected your use cases to be very diverse from the point of view of the cards though. The writing hardware may have some odd characteristics, especially possible in this first version of cf express compatibility.
 
Can I interpret the result in such way that I can shoot >=30 images without seeing a difference (I would never shoot in uncompressed but rather lossless compressed), I also use Z7 and it would be great to see the effect here as well because the file size is much larger!

Anyhow, As I seldom do bursts in 'H' with more than 10 pictures I would be safe using CFE, especially with new Sony CFE reader (which can read also XQD) but at twice the speed!

So as soon as CFE gets cheaper than XQD we can jump on the train!
 
I ran a bunch of permutations of image area size and 14-bit vs 12-bit, all uncompressed. The purpose here was to further define the relationship between image size and CFE slowdown vs XQD. In previous tests, CFE would slow down vs XQD based on the apparent size of the raw - 14-bit uncompressed on CFE was materially slower than XQD, whereas other raw formats such compressed showed CFE and XQD performing similarly. Based on the additional permutations below it appears that it's not raw size which is the issue, as least not in isolation. For example, FX area 12-bit uncompressed shows CFE performing the same as XQD even though the raw size averages 38,848 KB, whereas CFE is materially slower than XQD for DX area 14-bit uncompressed, where the raw size averages 19,956. This means the situations where CFE under-performs vs XQD is more nuanced than just size. More tests will be needed to see if those nuances can be discovered.

One thing is clear for the test permutations where CFE under-performs - very large air-bubbles in fps can be seen, where sub-second gaps between frames sometimes approach up to a full second, which means the fps drops to near 1. Most times however the gaps are not as extreme but they're still higher than XQD. Look at the last two graphs at the bottom of this post to see 4 runs of XQD vs CFE of a single configuration to see this demonstrated more clearly.

Image area size of Full FX (36x24 mm), 12-bit, Uncompressed (direct link). XQD = 98 images, CFE = 100 images:

XQD vs CFE, 20-second burst, sub-second delta time between frames, FX area size, 12-bit Uncompressed

XQD vs CFE, 20-second burst, sub-second delta time between frames, FX area size, 12-bit Uncompressed

Image area size of 1:1 (24x24 mm), 14-bit, Uncompressed (direct-link). XQD = 122 images, CFE = 109 images:

XQD vs CFE, 20-second burst, sub-second delta time between frames, 1:1 area size 14-bit Uncompressed

XQD vs CFE, 20-second burst, sub-second delta time between frames, 1:1 area size 14-bit Uncompressed

Image area size of 1:1 (24x24 mm), 12-bit, Uncompressed (direct link). XQD = 135 images, CFE = 131 images:

XQD vs CFE, 20-second burst, sub-second delta time between frames, 1:1 area size 12-bit Uncompressed

XQD vs CFE, 20-second burst, sub-second delta time between frames, 1:1 area size 12-bit Uncompressed

Image area size of 16:9 (36x20 mm), 12-bit, Uncompressed (direct link). XQD = 101 images, CFE = 106 images:

XQD vs CFE, 20-second burst, sub-second delta time between frames, 16:9 area size 12-bit Uncompressed

XQD vs CFE, 20-second burst, sub-second delta time between frames, 16:9 area size 12-bit Uncompressed

Image area size of DX (24x16 mm), 14-bit, Uncompressed (direct link). XQD = 153 images, CFE = 140 images:

XQD vs CFE, 20-second burst, sub-second delta time between frames, DX area size 14-bit Uncompressed

XQD vs CFE, 20-second burst, sub-second delta time between frames, DX area size 14-bit Uncompressed

Four seperate runs of the above DX (24x16mm), 14-bit, Uncompressed (direct link), first XQD (direct link) and then CFE (direct link):

Four runs of DX (24x16mm), 14-bit, Uncompressed (XQD)

Four runs of DX (24x16mm), 14-bit, Uncompressed (XQD)

Four runs of DX (24x16mm), 14-bit, Uncompressed (CFE)

Four runs of DX (24x16mm), 14-bit, Uncompressed (CFE)
 
Last edited:
I also own a Panasonic S1, which also received a firmware update with CFE support. This is a 24MP body like the Z6, with an average raw blackfame size of 34,479 KB, vs 44,892 KB for the Z6. Using the same XQD/CFE cards as the Z6 test, here are the results for the S1. I had to use a burst time of 25 seconds on the S1 vs 20 seconds on the Z6 due to the S1 having a much deeper effective buffer. The S1 has a continuous rate of 9fps, the same as the Z6.

Here are the results (direct link). For XQD the S1's fps doesn't start slowing down until photo #128 in the burst, for a total 25-second image count of 200. For CFE the S1's fps never drops below the rate 9fps, for a total 25-second image count of 242. The clear conclusion is that the S1 performs materially better with CFE vs XQD. I'll try to test this on an S1R (47 megapixels) later this week to see what buffer limit if any it has with CFE.

XQD vs CFE, 25-second burst, sub-second delta time between frames, Panasonic S1

XQD vs CFE, 25-second burst, sub-second delta time between frames, Panasonic S1
 
Thanks, good stuff!

A quick question: How reliable do you think the EXIF timestamps are--eg. if they're calculated at exposure time and there's lag in the writes vs. if they're written at the time the file is completed writing to the card?

An alternate method might be recording audio of the shutter sounds and overlaying them, which would take out this unknown.
It's not clear to me yet if the sub-second values are encoded when the image is taken or when the write of the image data has been submitted to the media card queue. In practice I think the difference would be muddied by the fact that the I/O submitted to the card might be delayed anyway (depending on position in queue). Nonetheless I plan to shoot the digital clock on my iPhone and compare the delta times between the visual clock times and the sub-second values to see if they match up.
If you have a device that can log voltages with high time resolution, the hot shoe with a pullup resistor would give some extremely clean data.
Thanks. I was able to answer the above question shooting an iphone clock instead. Also, I'm using the electronic shutter for all my tests, which doesn't trigger the hot shoe.
 
Can I interpret the result in such way that I can shoot >=30 images without seeing a difference (I would never shoot in uncompressed but rather lossless compressed), I also use Z7 and it would be great to see the effect here as well because the file size is much larger!

Anyhow, As I seldom do bursts in 'H' with more than 10 pictures I would be safe using CFE, especially with new Sony CFE reader (which can read also XQD) but at twice the speed!

So as soon as CFE gets cheaper than XQD we can jump on the train!
I haven't compared the cards on a Z7 yet but for the Z6 it's safe to say there's no material difference in buffer capacity/write rate for bursts of <= 30 images.
 
Interesting Horshack. Do you know what the variables are here? Knowing nothing about these cards I would guess:
  • Camera image encoding speed
  • Camera continuous write speed
  • Camera random write speed
  • Camera buffer
  • Camera drivers
  • Card buffer (if any)
  • Card drivers
  • Card continuous write speed
  • Card random write speed
  • What else?
Are we to assume that the hiccups with the otherwise unhindered S1 plots are due to write/read/signal/rewrite cycles due to errors? Or perhaps the card has run out of contiguous space and needs to move to a different free area? How are you formatting the cards between tests?

Very good info.

Jack
 
Last edited:
Interesting Horshack. Do you know what the variables are here? Knowing nothing about these cards I would guess:
  • Camera image encoding speed
  • Camera continuous write speed
  • Camera random write speed
  • Camera buffer
  • Camera drivers
  • Card buffer (if any)
  • Card drivers
  • Card continuous write speed
  • Card random write speed
  • What else?
Are we to assume that the hiccups with the otherwise unhindered S1 plots are due to write/read/signal/rewrite cycles due to errors? Or perhaps the card has run out of contiguous space and needs to move to a different free area? How are you formatting the cards between tests?

Very good info.

Jack
Hi Jack, I'm formatting the cards in-camera before every test, which clears out the logical structure of the filesystem but wont actually erase the underlying NAND pages, which means runs have the potential to run into page allocation/erase cycles. I'm not sure of the exact NAND structure of these XQD/CFE cards but I would suspect each has some amount of SLC-based cache, which is used to absorb bursts of writes, which are then propagated to the slower but more cost effected MLC/TLC cells in a context outside the original write completion to the host controller/camera (ie, the writes are posted, which means they're marked as completed before being committed to their final-at-rest location in NAND). As the SLC cache is exhausted there will be backpressure applied in the form of slower write completions, which might be seen as air bubbles in throughput/fps. All that said, it's not clear if these cameras are fast enough to exhaust the SLC cache or even the throughput of the slower MLC/TLC cells.

There are actually a large number of variables that determine I/O throughput in any system. Apart from the speed of the medium (discussed above), It starts with the I/O bus, which in this case PCIe 2.0/3.0, which should be plenty fast for this configuration (PCIe 2.0 is 500 MB/s per lane per direction, 3.0 is 1 GB/s), with both XQD and CFE likely implementing two lanes. Then there is the I/O protocol overhead, which is small for XQD and even smaller for CFE since it's based on simple submission/completion queues with a posted-register-write architecture that eliminates most memory/bus read latencies.

Next is the memory subsystem, which affects not only the performance of memory transfers to/from the PCIe interface but more importantly the total aggregate performance of the system since they're all interconnected and each uses the shared bandwidth of the memory. The largest consumers of memory bandwidth in the camera would be the sensor complex writing raw data to frame buffers, the imaging ASIC reading/writing those same frame buffers to do any raw-based image processing and to generate the embedded JPGs of the raws, the I/O controller writing data to the card, and the embedded processor running the firmware doing memory fetches for the instruction cache and memory feteches/stores for data accesses that miss DCACHE. I used to model memory systems for high-performance storage adapters so it can get pretty complex determining what the effect throughput and utilization esp when all these memory consumers are running concurrently. It'll also depend on the presence and depth of FIFOs at each interface, the multi-ported design of the memory, etc..

The biggest variable is how the camera is managing its frame buffers and structuring and ordering its I/Os to the card, since performance is highly dependent on queue depth and I/O size. A raw file actually has a lot of variable-length parts, many of which wont be contiguous in physical memory. There's all the headers, EXIF data, multiple embedded JPGs and the raw data itself. The most efficient way to handle this in terms of memory usage and throughput is to construct each of these areas in separate memory pools, then use Scatter/Gather lists for the media writes so that the discontiguous memory areas can be written as a whole to a contiguous set of logical blocks on the media. The alternative to that is to sequence multiple writes to the media, one for each of these discontiguous memory areas. Without being able to instrument the I/O interface on the cameras I can only guess as to how they actually implement their writes.
 
Last edited:
Great work.

CFE update provides NO measurable improvement in everyday use at this time and there is no reason to change from QXD unless one wants to incur an expense and try out this new technology?

Am I correct?

Kinda sounds like technology upgrade for the sake of technology upgrade, don't fix it if it ain't broke?
 
I'm using the electronic shutter for all my tests, which doesn't trigger the hot shoe.
Do you think that the slow scans might have some effect on the results?
 
Great work.

CFE update provides NO measurable improvement in everyday use at this time and there is no reason to change from QXD unless one wants to incur an expense and try out this new technology?

Am I correct?

Kinda sounds like technology upgrade for the sake of technology upgrade, don't fix it if it ain't broke?
It's a reasonable statement for the Z6, and probably for the Z7 if I see the same results there. But not for the Panasonic S1 I tested, which showed a noticeable improvement in buffer capacity with CFE. Hopefully future Z bodies will take better advantage of the additional performance of CFE cards.
 

Keyboard shortcuts

Back
Top