Nikon Z9 - Buffer Issue Solved | Work Around ????

Yes, the people reporting the problem state it happens while shooting. Therefore that means the camera is writing to the buffer so yes, there will be something in the buffer.
Okay, let me explain this step by step.

1. The problem happens when there's something in the buffer, and you start shooting again (very very very rarely).

2. Slow cards empty the buffer at a slower pace.
You're making the assumption that the card has something to do with the buffer. It doesn't. The buffer is sitting outside of EXPEED7, and EXPEED7 is taking fully formed images out of the buffer to send to the card write mechanism. If the diagram I have for EXPEED internals is still true—and I believe it is—the decision on when to move more information to the write mechanism is all EXPEED's. It can pause for the write mechanism.
3. If things are in the buffer longer, it increases the odds you shoot and put new things in the buffer behind the stuff that isn't on the card yet.
Again, that also is handled by EXPEED. Basically EXPEED is grabbing information from the memory that just consists of a block DNs, processing them into a file, writing that file back out to the memory, then as the card can take more images, moving that file from memory to the card.
4. The slower card leaves things in the buffer longer because they can't write quickly.

5. Therefore, using a slower card is more likely to trigger the issue than a fast card, even though it's still extremely extremely rare.
As I've written several times, we've eliminated card brand, card size, and card speed as a trigger variable for the issue.

I spent most of my career in Silicon Valley in charge of organizations that had to deal with problems like this. Indeed, my first such job was managing software at Osborne, and on my very first day there I discovered that the software engineers had done something wrong: instead of waiting for the disc controller to send the operation complete flag, the code simply did NOP, NOP, NOP assuming that this was a long enough pause. (NOP is a code for No Operation, or just don't do anything with the CPU for a cycle.)

CFexpress is wildly complex, as its not a sequential write and has a load balancer (in theory) trying to make sure you don't fry just one page on the NAND.
 
Yes, the people reporting the problem state it happens while shooting. Therefore that means the camera is writing to the buffer so yes, there will be something in the buffer.
Okay, let me explain this step by step.

1. The problem happens when there's something in the buffer, and you start shooting again (very very very rarely).

2. Slow cards empty the buffer at a slower pace.
You're making the assumption that the card has something to do with the buffer. It doesn't. The buffer is sitting outside of EXPEED7, and EXPEED7 is taking fully formed images out of the buffer to send to the card write mechanism. If the diagram I have for EXPEED internals is still true—and I believe it is—the decision on when to move more information to the write mechanism is all EXPEED's. It can pause for the write mechanism.
The card has everything to do with how long images sit in the buffer. If the card can get 1 image per second written to it, then the buffer will contain images until however many seconds pass, and the card will be written to. I don't understand why this is controversial at all.
3. If things are in the buffer longer, it increases the odds you shoot and put new things in the buffer behind the stuff that isn't on the card yet.
Again, that also is handled by EXPEED. Basically EXPEED is grabbing information from the memory that just consists of a block DNs, processing them into a file, writing that file back out to the memory, then as the card can take more images, moving that file from memory to the card.
Yes, but the images/data has to be retained somewhere...which is the buffer.
4. The slower card leaves things in the buffer longer because they can't write quickly.

5. Therefore, using a slower card is more likely to trigger the issue than a fast card, even though it's still extremely extremely rare.
As I've written several times, we've eliminated card brand, card size, and card speed as a trigger variable for the issue.
Then what exactly is the issue at hand?
 
Like Thom keeps saying the problem appears to be between the processor and the buffer. Not between the buffer and memory card.
And yet, the problem only happens when the buffer isn't clear. Qed...
The "buffer" is not in the card. EXPEED7 is in between the buffered images and the card.
Yes, I understand this. The buffer is where the images are held prior to being dumped off to the card. Some other people do not understand this.

Where is the failure point? In plain english, explain what happens at the point of failure. To my understanding, images remain in buffer, you start another burst, lock up (in extremely rare cases). Is that incorrect?
I described it above. EXPEED is handling new data input (from the image sensor), processing that data and creating a final file from it, then later managing the process of pulling those files and sending them to the card write mechanism. My professional bet is that the issue is in the EXPEED process handling. It's losing a pointer somewhere.
 
Like Thom keeps saying the problem appears to be between the processor and the buffer. Not between the buffer and memory card.
And yet, the problem only happens when the buffer isn't clear. Qed...
The "buffer" is not in the card. EXPEED7 is in between the buffered images and the card.
Yes, I understand this. The buffer is where the images are held prior to being dumped off to the card. Some other people do not understand this.

Where is the failure point? In plain english, explain what happens at the point of failure. To my understanding, images remain in buffer, you start another burst, lock up (in extremely rare cases). Is that incorrect?
I described it above. EXPEED is handling new data input (from the image sensor), processing that data and creating a final file from it, then later managing the process of pulling those files and sending them to the card write mechanism. My professional bet is that the issue is in the EXPEED process handling. It's losing a pointer somewhere.
Embedded Expeed Memory controller issue? I hope that can be fixed with firmware.


--
jamesgrove.photography
 
The card has everything to do with how long images sit in the buffer. If the card can get 1 image per second written to it, then the buffer will contain images until however many seconds pass, and the card will be written to. I don't understand why this is controversial at all.
It shouldn't make any difference how long it takes for a card to complete its last action. EXPEED should be handling the wait. In electronics there's long been the notion of flags and interrupts. Something in the flag/interrupt sequences is going amiss.

And again, I believe I've eliminated card brand, size, and speed from the cause.
As I've written several times, we've eliminated card brand, card size, and card speed as a trigger variable for the issue.
Then what exactly is the issue at hand?
If I knew that, I'd go to Tokyo, fix some code, and fly home. Or in lieu of that, send Nikon some code via the Internet. Or maybe we should just have them consult ChatGPT? ;~)
 
Like Thom keeps saying the problem appears to be between the processor and the buffer. Not between the buffer and memory card.
And yet, the problem only happens when the buffer isn't clear. Qed...
The "buffer" is not in the card. EXPEED7 is in between the buffered images and the card.
Yes, I understand this. The buffer is where the images are held prior to being dumped off to the card. Some other people do not understand this.

Where is the failure point? In plain english, explain what happens at the point of failure. To my understanding, images remain in buffer, you start another burst, lock up (in extremely rare cases). Is that incorrect?
I described it above. EXPEED is handling new data input (from the image sensor), processing that data and creating a final file from it, then later managing the process of pulling those files and sending them to the card write mechanism. My professional bet is that the issue is in the EXPEED process handling. It's losing a pointer somewhere.
Embedded Expeed Memory controller issue? I hope that can be fixed with firmware.
I would guess not. The problem surfaced with the Z9, which pioneered that dual data stream. I'd guess, given the viewfinder freeze, that it's a real time interrupt type issue.
 
Correct. I can't force the camera to fail.
I've now made it fail twice in two years and many terrabytes of data.
Which makes me believe this isn't a z9 wide issue, because if it was easily replicatable we'd see more reports.
I'm not sure where you're going here. We've got over 100k Z9's in the field and the list of people who've encountered the issue at some point is growing rapidly as it gets discussed. Since the problem is so intermittent, you don't think anything about it the first time it happens to you, you think maybe you did something to trigger it.
They know it exists, but frankly if you're running out of buffer
Stop characterizing the issue incorrectly. The buffer is not full, so you're not "running out of buffer."
 
Like Thom keeps saying the problem appears to be between the processor and the buffer. Not between the buffer and memory card.
And yet, the problem only happens when the buffer isn't clear. Qed...
The "buffer" is not in the card. EXPEED7 is in between the buffered images and the card.
Yes, I understand this. The buffer is where the images are held prior to being dumped off to the card. Some other people do not understand this.

Where is the failure point? In plain english, explain what happens at the point of failure. To my understanding, images remain in buffer, you start another burst, lock up (in extremely rare cases). Is that incorrect?
I described it above. EXPEED is handling new data input (from the image sensor), processing that data and creating a final file from it, then later managing the process of pulling those files and sending them to the card write mechanism. My professional bet is that the issue is in the EXPEED process handling. It's losing a pointer somewhere.
Embedded Expeed Memory controller issue? I hope that can be fixed with firmware.
I would guess not. The problem surfaced with the Z9, which pioneered that dual data stream. I'd guess, given the viewfinder freeze, that it's a real time interrupt type issue.
I hope Nikon can get to the bottom of it, clearly it must be complex otherwise I am sure it would have been fixed by now. Let's hope Nikon have listened and they can replicate it and then troubleshoot.


--
jamesgrove.photography
 
Given how fast the cards write, and how long it takes to start actually hitting the buffer (to where it doesn't clear instantly), I have a lot of questions about how people are shooting that they run into this problem.

As stated, I have tried to replicate the issue. I have not. In normal shooting, I physically can't outrun the buffer. Even with raw plus jpg, it takes 3 seconds to slow down, and then I stop and the buffer is clear again.

So my question is, exactly how are people hitting this situation, given using a fast enough card (in my experience) prevents the issue?
Intermittent timing-related firmware bugs don't have structure or rules, like "shouldn't fail if the buffer doesn't run out". If it conditions leading to the camera hang were easily discoverable then it would have been disclosed by now.
Moreover, I'm going to quote one of the wisest things I've ever heard about digital electronics from my lead engineer back in 1982 (Lee Felsenstein): "At some level all digital becomes analog again." He was referring to the time it takes a signal to change from 0 to 1, where it is some unknown "intermediary" value. My guess is that it will eventually come down to something is there's a brief moment when the data is not in a known state or the clock not correct. This could then end up in some cases producing an "out of range" value and triggers an incorrect execution. That's exactly the type of thing that hackers look for, by the way.
 
The buffer is pretty darn quick to clear. How big/long of a burst are we talking about?
I currently don't believe length has anything to do with the problem. I triggered in in less than four seconds.
Sounds like the buffer clears a shot before the jpg processing is done, then the camera can't find the remaining file to finish and locks up.
 
Yes, the people reporting the problem state it happens while shooting. Therefore that means the camera is writing to the buffer so yes, there will be something in the buffer.
Okay, let me explain this step by step.

1. The problem happens when there's something in the buffer, and you start shooting again (very very very rarely).

2. Slow cards empty the buffer at a slower pace.
You're making the assumption that the card has something to do with the buffer. It doesn't. The buffer is sitting outside of EXPEED7, and EXPEED7 is taking fully formed images out of the buffer to send to the card write mechanism. If the diagram I have for EXPEED internals is still true—and I believe it is—the decision on when to move more information to the write mechanism is all EXPEED's. It can pause for the write mechanism.
The card has everything to do with how long images sit in the buffer. If the card can get 1 image per second written to it, then the buffer will contain images until however many seconds pass, and the card will be written to. I don't understand why this is controversial at all.
3. If things are in the buffer longer, it increases the odds you shoot and put new things in the buffer behind the stuff that isn't on the card yet.
Again, that also is handled by EXPEED. Basically EXPEED is grabbing information from the memory that just consists of a block DNs, processing them into a file, writing that file back out to the memory, then as the card can take more images, moving that file from memory to the card.
Yes, but the images/data has to be retained somewhere...which is the buffer.
4. The slower card leaves things in the buffer longer because they can't write quickly.

5. Therefore, using a slower card is more likely to trigger the issue than a fast card, even though it's still extremely extremely rare.
As I've written several times, we've eliminated card brand, card size, and card speed as a trigger variable for the issue.
Then what exactly is the issue at hand?
We've already eliminated a card issue with the group of professional photographers who are all using proper cards on the link I provided.

Can we please move on from stating that this group of professional photographers and me as the OP who brought this to the forum are having ANY related card issues. WE ARE NOT.

You're really sounding like a Nikon commercial.

NPS is aware of the issue. They do not know yet if a firmware or enhanced internal buffer will solve the issue. HOWEVER IT IS NOT A CARD ISSUE.
 
You're really sounding like a Nikon commercial.
No, I'm making a point. Which apparently everyone here missed, which isn't particularly surprising to me.
NPS is aware of the issue. They do not know yet if a firmware or enhanced internal buffer will solve the issue. HOWEVER IT IS NOT A CARD ISSUE.
We'll all find out I suppose.
 
Given how fast the cards write, and how long it takes to start actually hitting the buffer (to where it doesn't clear instantly), I have a lot of questions about how people are shooting that they run into this problem.

As stated, I have tried to replicate the issue. I have not. In normal shooting, I physically can't outrun the buffer. Even with raw plus jpg, it takes 3 seconds to slow down, and then I stop and the buffer is clear again.

So my question is, exactly how are people hitting this situation, given using a fast enough card (in my experience) prevents the issue?
Intermittent timing-related firmware bugs don't have structure or rules, like "shouldn't fail if the buffer doesn't run out". If it conditions leading to the camera hang were easily discoverable then it would have been disclosed by now.
Moreover, I'm going to quote one of the wisest things I've ever heard about digital electronics from my lead engineer back in 1982 (Lee Felsenstein): "At some level all digital becomes analog again." He was referring to the time it takes a signal to change from 0 to 1, where it is some unknown "intermediary" value. My guess is that it will eventually come down to something is there's a brief moment when the data is not in a known state or the clock not correct. This could then end up in some cases producing an "out of range" value and triggers an incorrect execution. That's exactly the type of thing that hackers look for, by the way.
Yep. Chasing critical timing bugs / race conditions in high-speed embedded systems is something I had to do my entire career. I'll tell you a story about one of them. We had a RAID system with embedded 2.5" disk drives from a well-known Japanese manufacturer. We had a customer reporting a very intermittent data corruption issue - about once a month they'd get a single instance of filesystem corruption out of a large pool of installations. The proverbial needle in the haystack. After many iterations of adding layers of debug tracing and history logic I finally discovered an I/O pattern that would induce the disk drive to return the wrong data for a single read I/O out of thousands of attempts with that same I/O pattern. I literally was issuing the same set of I/Os over and over again and only once out of maybe 100k attempts it produced the wrong data.

After working with the drive manufacturer we finally root caused the issue. During certain seek timings in which the heads were moving across the platters, the drive would pick up the wrong embedded servo information that it reads while the heads are still moving into position. It was an optimization to reduce seek and rotational latency but by not allowing the heads to fully settle the drive was intermittently picking up the servo information on the adjacent track, causing its seek to settle on the wrong track and read the wrong data.

That one took a few weeks of testing and a three month trip to Japan to figure out :)
 
Last edited:
Nikon knows it's an issue
They know it exists, but frankly if you're running out of buffer with high end cards that often I can't imagine why you're shooting that way given how quick the buffer clears for me in real world use, even in hot conditions shooting constantly for long periods of time.
Help me understand.

Does this mean that shooting too long a burst is an acceptable reason for the problem?
Given how fast the cards write, and how long it takes to start actually hitting the buffer (to where it doesn't clear instantly), I have a lot of questions about how people are shooting that they run into this problem.

As stated, I have tried to replicate the issue. I have not. In normal shooting, I physically can't outrun the buffer. Even with raw plus jpg, it takes 3 seconds to slow down, and then I stop and the buffer is clear again.

So my question is, exactly how are people hitting this situation, given using a fast enough card (in my experience) prevents the issue?
Intermittent timing-related firmware bugs don't have structure or rules, like "shouldn't fail if the buffer doesn't run out". If it conditions leading to the camera hang were easily discoverable then it would have been disclosed by now.
Then why does the issue only seem to happen (based on reports here) when there's images in the buffer that haven't cleared?

I'd like someone to actually explain (in plain English) the circumstances in which this happens.
If Nikon knew the why then the fix would already be developed, tested, and released by now. This is a critical bug that I'm sure has Nikon's full attention and priority.
 
  • Like
Reactions: HRS
I've similar experiences here in Japan with the Z9. And it might be related to heat issues. It's 36 Celsius here. No difference which type/brand card is used. I'm not recording video.

- Camera responds very slow when switched on
- Focusing screen refresh rate extremely slow
- Camera hangs, black back screen

Removing and inserting the battery solves it.

As a side note, I've used the camera before under very hot circumstances but this is the first time the camera completely hangs.

Michel

On a number of shoots when working with full size RAW & JPEGS at 15fps, 20fps and 30fps
  1. the Z9 would freeze requiring a battery pull or card pull,
  2. come back with a buffer error,
  3. or the screen would go black and come back after 3- 5 seconds.
This was very frustrating, on shoots and caused a loss of critical shots, so I tried different makes and sizes of cards ranging from:
  • ProGrade 650GB - Read 1700 / Write 1500 - 2 Different Cards
  • AV Pro 1TB - Read 1785 / Write 1300 - 2 Different Card
I stopped shooting RAW & JPEG and just RAW and the issue has disappeared.

Just wondering anyone else has had this issue and used the same work around and if there is a solution. Perhaps I'm not using good enough cards? the ProGrades were not cheap.

Thank you,

Tony
 
The buffer is pretty darn quick to clear. How big/long of a burst are we talking about?
I currently don't believe length has anything to do with the problem. I triggered in in less than four seconds.
Sounds like the buffer clears a shot before the jpg processing is done, then the camera can't find the remaining file to finish and locks up.
One of the "clues" I gave Nikon was this: when it happened to me, others around me were photographing the same action. I could compare the frozen viewfinder to the last recorded image on the card (several frames were missing on the card) and get a real sense of the actual time frame by both counting the missing frames and comparing it to the timings of the images from the photographer behind me.
 
On a number of shoots when working with full size RAW & JPEGS at 15fps, 20fps and 30fps
  1. the Z9 would freeze requiring a battery pull or card pull,
  2. come back with a buffer error,
  3. or the screen would go black and come back after 3- 5 seconds.
This was very frustrating, on shoots and caused a loss of critical shots, so I tried different makes and sizes of cards ranging from:
  • ProGrade 650GB - Read 1700 / Write 1500 - 2 Different Cards
  • AV Pro 1TB - Read 1785 / Write 1300 - 2 Different Card
I stopped shooting RAW & JPEG and just RAW and the issue has disappeared.

Just wondering anyone else has had this issue and used the same work around and if there is a solution. Perhaps I'm not using good enough cards? the ProGrades were not cheap.

Thank you,

Tony
 
Yep. Chasing critical timing bugs / race conditions in high-speed embedded systems is something I had to do my entire career. I'll tell you a story about one of them. We had a RAID system with embedded 2.5" disk drives from a well-known Japanese manufacturer. We had a customer reporting a very intermittent data corruption issue - about once a month they'd get a single instance of filesystem corruption out of a large pool of installations. The proverbial needle in the haystack. After many iterations of adding layers of debug tracing and history logic I finally discovered an I/O pattern that would induce the disk drive to return the wrong data for a single read I/O out of thousands of attempts with that same I/O pattern. I literally was issuing the same set of I/Os over and over again and only once out of maybe 100k attempts it produced the wrong data.

After working with the drive manufacturer we finally root caused the issue. During certain seek timings in which the heads were moving across the platters, the drive would pick up the wrong embedded servo information that it reads while the heads are still moving into position. It was an optimization to reduce seek and rotational latency but by not allowing the heads to fully settle the drive was intermittently picking up the servo information on the adjacent track, causing its seek to settle on the wrong track and read the wrong data.

That one took a few weeks of testing and a three month trip to Japan to figure out :)
I'll tell another story that's a little different. When Seagate produced their first hard drive, they gave the first assembled unit to me to test and write about (I was editor of the primary Silicon Valley newspaper at the time). Every day (first hint) I'd find a file that was corrupted. They at first had me trying to do all the usual sequence-related things trying to find an error in the disk controller to drive communication scheme. No dice.

Eventually, they came out to my house with some test gear. Since this wasn't a drive I was using critically, I was formatting it every day after detecting an error when I turned the computer back on. So they looked at the drive when they arrived, no errors. They asked me to turn off my computer so they could hook up some intermediary test cables. Hooked everything back up and lo and behold, drive error already (second clue).

Eventually, through a variety of tests, it was discovered that when my computer's power supply powered down, the hard drive was writing randomly to the platter as the drive head was retreating back to its rest position at the edge of the platter. These unplanned writes were a real surprise to Seagate, as they were assuming that power off was a true on/off event that happened in zero time. That wasn't the case on my machine, it powered down very slowly, and the drive electronics was getting mixed signals because of that.
 
I've similar experiences here in Japan with the Z9. And it might be related to heat issues. It's 36 Celsius here. No difference which type/brand card is used. I'm not recording video.

- Camera responds very slow when switched on
- Focusing screen refresh rate extremely slow
- Camera hangs, black back screen
Those are different symptoms to the buffer freeze issue as I know it. You wouldn't get a black screen.
 
On a number of shoots when working with full size RAW & JPEGS at 15fps, 20fps and 30fps
  1. the Z9 would freeze requiring a battery pull or card pull,
  2. come back with a buffer error,
  3. or the screen would go black and come back after 3- 5 seconds.
This was very frustrating, on shoots and caused a loss of critical shots, so I tried different makes and sizes of cards ranging from:
  • ProGrade 650GB - Read 1700 / Write 1500 - 2 Different Cards
  • AV Pro 1TB - Read 1785 / Write 1300 - 2 Different Card
I stopped shooting RAW & JPEG and just RAW and the issue has disappeared.

Just wondering anyone else has had this issue and used the same work around and if there is a solution. Perhaps I'm not using good enough cards? the ProGrades were not cheap.

Thank you,

Tony
Just a crazy thought but, both you and tvstaff were shooting in close proximity to a group of other users with the same camera. 🧐
You're suggesting RFI (radio frequency interference). Given all the Wi-Fi/Bluetooth/GPS testing the camera would have been through, I'd tend to not think it would be something from other cameras. However, RFI comes from other sources than cameras.
 

Keyboard shortcuts

Back
Top