How to avoid replacing good files with corrupt files during backup

...As a result of that experience I now outfit every new desktop computer I buy with ECC memory to detect and correct errors on the fly.
What is the (approximate) % increase in memory cost for that type memory
In the Intel world you need to buy a Xeon CPU - there are Xeon equivalents for the regular "Core iX" chips that cost slightly more. For example, I use a Xeon 2278G CPU which is equivalent to a Core i9 9900K and which costs around 3% more.

Memory is around 10% more expensive because you need extra storage for the ECC information.

Probably the most significant potential increase in cost is for the motherboard - because you need to use a Xeon CPU it means you need to find a motherboard that supports one, which often limits you to using a workstation board. I paid about Cdn$300 for my Asus WS C246 Pro Motherboard.

There's no difference for all the other stuff (case, disks, SSDs, power supply, etc.), which means the net overall difference for an entire system is often only around 5-10%.

I don't use AMD CPU's, but it's my understanding that most or all of them support ECC memory. I have no idea whether that extends to AMD-capable motherboards.
 
I understand versioning and use it but I'm not clear how a person is supposed to tell the difference between a file that's been changed intentionally vs. one that has become corrupt. Examining all backed up files periodically would be tedious.
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted. There is no other way to use it. No backup program knows the difference between an intentional change to a file vs. a corruption of that file.

The classic case for this would accidental deletion that isn't immediately noticed. Or, when a document you've been working on, somehow gets corrupted for whatever reason (power outage, disk problem, OS bug, whatever).

A more serious case would be when ransomware encrypts a bunch of your files and your backup system may have run automatically after the ransomware started doing its dirty deed.

All of these situations can benefit from having some level of versioning so the only backup you have isn't just the last copy of the file that your backup software saw.
 
I currently use ASCOMP Synchredible to backup my photos from my hard drive to my Unraid NAS and also to an external hard drive.

I love the ease of use of Synchredible, however it will misidentify a corrupt file as a modified file and replace the backup copy of good file with a corrupt file.

I can not find an option in Synchredible to avoid this?

Does anyone know how to avoid good backup files from being replaced with corrupt files? Or recommend another application that avoids this from happening?

Thanks kindly...

Kenmo
Add to that -- how would you identify (detect) corrupt files even before backup(s) ?
For a specific file format (like a Word document or a JPEG image or a Nikon RAW file), one could write code that would examine the file for corruption. This would involve a detailed knowledge of each file format and would be looking to see if the bytes in this file all follow the legal format of the file.

But, a file checker algorithm would have to be independently written for every type of file you cared about.

This would likely identify disk corruption (since disk corruption is unlikely to lead to an orderly file format). This would probably identify mass, ransom-ware encryption of your files. This might identify small bit rot or might not, depending upon which bit rotted and where it was in the file structure.

This would not identify files being "edited" where the content is changed, but its still legal content for the file format.

For files with little structure to them (like plain text files), this algorithm would be less useful because it's harder to tell the difference between a valid and a corrupted file since the rule of what is not corrupted is pretty broad.

FYI, virus checking programs are doing something like this already. While they aren't necessarily checking for a fully valid file, they are looking at different types of files to see if they've been modified in nefarious ways.

FYI, disk bit rot could be detected by storing a CRC value for every file along with the last modified timestamp, then routinely scanning every file and recomputing the CRC. If the CRC value has changed, but the timestamp has not, then non-normal change has occurred to the file. It could be disk bit rot or could be a program changing the contents of the file, but then setting the last modified timestamp back.

If the timestamp is changed (because some program modified the file), then the CRC method wouldn't know if the file was legitimately changed or not so it would really on by useful for disk bit rot when the file timestamp doesn't change.

--
John
 
Last edited:
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted. There is no other way to use it. No backup program knows the difference between an intentional change to a file vs. a corruption of that file.
Another way to do it is to use checksums - either via a robust file system that has it built-in or manually. Robust file systems will detect and correct corruption that occurs on disk but not corruption that happens elsewhere (i.e., accidentally deleted files or memory errors).
 
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted. There is no other way to use it. No backup program knows the difference between an intentional change to a file vs. a corruption of that file.
Another way to do it is to use checksums - either via a robust file system that has it built-in or manually. Robust file systems will detect and correct corruption that occurs on disk but not corruption that happens elsewhere (i.e., accidentally deleted files or memory errors).
To detect such errors, and ransomware as well, you could run md5sum program on all your critical files before backing them up, comparing to its previous result. This page


describes how to do this on Linux, but W10 has md5sum also (Enterprise edition anyway).
 
Another way to do it is to use checksums - either via a robust file system that has it built-in or manually. Robust file systems will detect and correct corruption that occurs on disk but not corruption that happens elsewhere (i.e., accidentally deleted files or memory errors).
To detect such errors, and ransomware as well, you could run md5sum program on all your critical files before backing them up, comparing to its previous result. This page

https://askubuntu.com/questions/318530/generate-md5-checksum-for-all-files-in-a-directory

describes how to do this on Linux, but W10 has md5sum also (Enterprise edition anyway).
You can also get checksum values with the certutil command which I think is included with all editions of Windows. Here’s a sample syntax for using it on an executable for one of Microsoft’s Sysinternals utilities:

C:\Program Files\Utils\Sysinternals\Autoruns>certutil -hashfile autoruns.exe MD5
MD5 hash of autoruns.exe:
583ed542be17b83f3c102d49fe984e26
CertUtil: -hashfile command completed successfully.

C:\Program Files\Utils\Sysinternals\Autoruns>

The program will do checksums for MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512.

Admittedly, it would require a healthy amount of scripting skill to incorporate this into a backup regimen of any sort, let alone the resources to run it. It’s still a good thing to be aware of though simply for verifying downloads that have checksum values available.
 
Last edited:
You can also get checksum values with the certutil command which I think is included with all editions of Windows. Here’s a sample syntax for using it on an executable for one of Microsoft’s Sysinternals utilities:

C:\Program Files\Utils\Sysinternals\Autoruns>certutil -hashfile autoruns.exe MD5
MD5 hash of autoruns.exe:
583ed542be17b83f3c102d49fe984e26
CertUtil: -hashfile command completed successfully.

C:\Program Files\Utils\Sysinternals\Autoruns>

The program will do checksums for MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512.

Admittedly, it would require a healthy amount of scripting skill to incorporate this into a backup regimen of any sort, let alone the resources to run it. It’s still a good thing to be aware of though simply for verifying downloads that have checksum values available.
True, I guess. PowerShell makes some sense to me but old DOS bat scripts never did.

On Linux, you could write the checksums into a file, and diff the old and new results before running FreeFileSync.

This method would require that new photos (or whatever) be placed in a new folder. I'm not sure whether and how it would work with a Lightroom database.
 
Last edited:
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted. There is no other way to use it. No backup program knows the difference between an intentional change to a file vs. a corruption of that file.
Another way to do it is to use checksums - either via a robust file system that has it built-in or manually. Robust file systems will detect and correct corruption that occurs on disk but not corruption that happens elsewhere (i.e., accidentally deleted files or memory errors).
But, a checksum by itself doesn't know anything about whether a change is intentional or not. I've already written a post (in this thread) about storing CRC values for each file and using that in combination with a last modified date to detect pure bit rot (unintentional change). But, that requires custom programming to do as I'm not aware of any already-built tool to do so.

--
John
 
Last edited:
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted.
Another way to do it is to use checksums
But, a checksum by itself doesn't know anything about whether a change is intentional or not.
True, but it will tell you if a file changed since the last time you checksumed it. That rules out corruption over time as long as you checksum modified files on a reasonably timely basis.

If you really, truly want 100% assurance then checksum your important document after editing and saving it, then reopen it to confirm it's still OK. Just be sure to not close it again in a way that will make any changes that invalidate the checksum.

I'm about as paranoid as anyone about data integrity, but with ECC memory in my system I feel confident enough that I don't go to that extent.
 
Last edited:
...As a result of that experience I now outfit every new desktop computer I buy with ECC memory to detect and correct errors on the fly.
What is the (approximate) % increase in memory cost for that type memory
In the Intel world you need to buy a Xeon CPU - there are Xeon equivalents for the regular "Core iX" chips that cost slightly more. For example, I use a Xeon 2278G CPU which is equivalent to a Core i9 9900K and which costs around 3% more.

Memory is around 10% more expensive because you need extra storage for the ECC information.

Probably the most significant potential increase in cost is for the motherboard - because you need to use a Xeon CPU it means you need to find a motherboard that supports one, which often limits you to using a workstation board. I paid about Cdn$300 for my Asus WS C246 Pro Motherboard.

There's no difference for all the other stuff (case, disks, SSDs, power supply, etc.), which means the net overall difference for an entire system is often only around 5-10%.

I don't use AMD CPU's, but it's my understanding that most or all of them support ECC memory. I have no idea whether that extends to AMD-capable motherboards.
Sean, thank you. I appreciate such an inclusive description of details required since it is not just buying different memory but other things to consider.
 
I currently use ASCOMP Synchredible to backup my photos from my hard drive to my Unraid NAS and also to an external hard drive.

I love the ease of use of Synchredible, however it will misidentify a corrupt file as a modified file and replace the backup copy of good file with a corrupt file.

I can not find an option in Synchredible to avoid this?

Does anyone know how to avoid good backup files from being replaced with corrupt files? Or recommend another application that avoids this from happening?

Thanks kindly...

Kenmo
Add to that -- how would you identify (detect) corrupt files even before backup(s) ?
In DpReview Forums, how can we place a comment post in the proper place which (in my opinion) is behind the post we are responding to.

I was responding to the originator of this thread and my post (above this one) was placed at the bottom of ALL posts.
In dpreview as in all other forums I've participated in, when a post is submitted it is always the last post in the thread. However, viewing in threaded mode will show one's response under the post one is responding to.
As far as I know since I can't keep up with all the modes and descriptions that apply to various elements of computer applications -- I always use the Threaded mode -- the one that lists each post (Title) along with person that made the post.

I don't use the other one that is displayed like progressive topics in a news paper (one after the other)

As I'm sure you know, but for newbies, the proper way to indicate which post one is responding to is to use "Reply with quote" and include all or at least a part of that post in one's response.
I always use "Reply with quote". However, as is illustrated in the above posts, my post does not appear after the OP post.
 
I use versioning at the time I discover that a file is missing or corrupted and I wish to recover a version of the file before it was deleted or corrupted.
Another way to do it is to use checksums
But, a checksum by itself doesn't know anything about whether a change is intentional or not.
True, but it will tell you if a file changed since the last time you checksumed it. That rules out corruption over time as long as you checksum modified files on a reasonably timely basis.

If you really, truly want 100% assurance then checksum your important document after editing and saving it, then reopen it to confirm it's still OK. Just be sure to not close it again in a way that will make any changes that invalidate the checksum.

I'm about as paranoid as anyone about data integrity, but with ECC memory in my system I feel confident enough that I don't go to that extent.
This is not very practical. I backup hundreds of thousands of files. many of which are modified regularly. You want to me to sort through a list of every file that changed since the last backup and figure out which ones where changed on purpose and which ones not? The whole point of automated backup is that you don't have to manually do some process to have a decent backup and the reason for that is that people don't do manual things regularly or reliably. So, ANY manual review of a backup just won't happen regularly for most people.
 
If you really, truly want 100% assurance then checksum your important document after editing and saving it, then reopen it to confirm it's still OK. Just be sure to not close it again in a way that will make any changes that invalidate the checksum.

I'm about as paranoid as anyone about data integrity, but with ECC memory in my system I feel confident enough that I don't go to that extent.
This is not very practical. I backup hundreds of thousands of files. many of which are modified regularly. You want to me to sort through a list of every file that changed since the last backup and figure out which ones where changed on purpose and which ones not?
I never said it was practical for large numbers of files, I said that it was what was necessary if you really, truly want absolute end-to-end protection. Obviously for large volumes of files you're going to just have to take your chances after having mitigated as many risks as possible through self-checking hardware and automated processes. If you do that then I don't think it's fair for anybody to point a finger at you and claim negligence.

Checksumming is still a useful tool, IMHO, if you regularly update the checksums for files that were changed since the last checksum update (you can use the file system modification dates for this). Doing it as part of a backup cycle makes sense. You're still at risk for problems between modification/saving and the checksumming operation, but you're then covered for anything after that.
 
If you really, truly want 100% assurance then checksum your important document after editing and saving it, then reopen it to confirm it's still OK. Just be sure to not close it again in a way that will make any changes that invalidate the checksum.

I'm about as paranoid as anyone about data integrity, but with ECC memory in my system I feel confident enough that I don't go to that extent.
This is not very practical. I backup hundreds of thousands of files. many of which are modified regularly. You want to me to sort through a list of every file that changed since the last backup and figure out which ones where changed on purpose and which ones not?
I never said it was practical for large numbers of files, I said that it was what was necessary if you really, truly want absolute end-to-end protection. Obviously for large volumes of files you're going to just have to take your chances after having mitigated as many risks as possible through self-checking hardware and automated processes. If you do that then I don't think it's fair for anybody to point a finger at you and claim negligence.

Checksumming is still a useful tool, IMHO, if you regularly update the checksums for files that were changed since the last checksum update (you can use the file system modification dates for this). Doing it as part of a backup cycle makes sense. You're still at risk for problems between modification/saving and the checksumming operation, but you're then covered for anything after that.
At scale, it would only be practical for files that are never supposed to change (like RAW files). Otherwise, it's really only practical for a small number of files you can manually track and make decisions about whether a changed CRC is a legitimate edit (something you remember making) or a problem. A regular backup can easily contain many, many files that are routinely modified as you run the variety of programs you run. Are you going to painstakingly go through that list every day to try to "catch" something out of the ordinary?

Take something like your Lightroom or Capture One database. How do you tell the difference between bit rot and normal use of those programs?
 
Checksumming is still a useful tool, IMHO, if you regularly update the checksums for files that were changed since the last checksum update (you can use the file system modification dates for this). Doing it as part of a backup cycle makes sense. You're still at risk for problems between modification/saving and the checksumming operation, but you're then covered for anything after that.
A regular backup can easily contain many, many files that are routinely modified as you run the variety of programs you run. ... Take something like your Lightroom or Capture One database. How do you tell the difference between bit rot and normal use of those programs?
True, routinely updated databases or other documents aren't a great fit for checksumming. One would hope that the software which drives those kinds of files would have some sort of integrity check that you could run on a regular basis to try to root out any problems.
 
Last edited:
How using PAR2/Multipar?
 
To avoid good backup files from being replaced with corrupt files, you can consider using a backup application that has versioning functionality. Versioning allows the backup application to keep multiple copies of your files, including previous versions, which can help prevent data loss if a file becomes corrupt.

Some backup applications that support versioning include:
  • Backblaze
  • CrashPlan
  • Acronis True Image
It's also a good practice to have multiple backups, stored on different devices or in different locations, to provide additional protection against data loss.
 
To avoid good backup files from being replaced with corrupt files, you can consider using a backup application that has versioning functionality. Versioning allows the backup application to keep multiple copies of your files, including previous versions, which can help prevent data loss if a file becomes corrupt.

Some backup applications that support versioning include:
  • Backblaze
  • CrashPlan
  • Acronis True Image
It's also a good practice to have multiple backups, stored on different devices or in different locations, to provide additional protection against data loss.
And FreeFileSync, as we've discussed.
 
To avoid good backup files from being replaced with corrupt files, you can consider using a backup application that has versioning...
Did you even bother to read the thread? Versioning was already mentioned a half-dozen times—seven months ago.

--
Sometimes I look at posts from people I've placed on my IGNORE list. When I do, I'm quickly reminded of why I chose to ignore them in the first place.
 
Last edited:

Keyboard shortcuts

Back
Top