DVD data verification algorithms?

CAcreeks

Forum Pro
Messages
20,531
Solutions
22
Reaction score
3,691
Location
US
Certain software (e.g. Nero) reads data from the just-burned CD/DVD and compares it to the source data. This takes a long time.

Other software (e.g. Brasero) creates a checksum from the data and, while writing, verifies this checksum (MD5 by default, with SHA1 or SHA265 optional). My recollection is that SHA* is more likely than MD5 to be guaranteed unique.

Is a checksum algorithm as reliable as data compare? I assume not. But if beginning and end of the CD/DVD are readable, the result seems trustworthy.
 
Solution
In terms of the likelihood of detecting a burn error, it really doesn't matter whether you read back all the original data to compare it to the CD/DVD or whether you use a checksum. I suppose a checksum has a very small chance of missing a legitimate error, but the likelihood is so small (literally 1 chance in billions) that it's really not a practical worry.

There's no difference between MD5 and SHA for this type of use. MD5 is vulnerable when used to generate cryptography keys because it's been discovered that it's possible to "spoof" a second set of data which generates the same checksum. But a CD/DVD error isn't trying to "spoof" your data, so that really isn't a concern here.

But IMHO there's a bigger issue at stake when...
In terms of the likelihood of detecting a burn error, it really doesn't matter whether you read back all the original data to compare it to the CD/DVD or whether you use a checksum. I suppose a checksum has a very small chance of missing a legitimate error, but the likelihood is so small (literally 1 chance in billions) that it's really not a practical worry.

There's no difference between MD5 and SHA for this type of use. MD5 is vulnerable when used to generate cryptography keys because it's been discovered that it's possible to "spoof" a second set of data which generates the same checksum. But a CD/DVD error isn't trying to "spoof" your data, so that really isn't a concern here.

But IMHO there's a bigger issue at stake when checking a DVD burn. You want to make sure that the burn itself is good. You can have a disk with a pretty marginal burn which still appears to read OK because the ECC codes on the disk are able to correct the errors on the fly. To guard against that you need to use a DVD burner which can report the raw error rates (before ECC correction) that the drive sees, and software (such as Nero DiscSpeed) which can show you what they are.

I've had poorly burned disks which appeared to read perfectly but which showed a ton of correctable ECC errors. I reburned those to better discs, but kept them around to see what would happen and found that they degraded quite considerably (the error rates went up by a factor of over 4X) over the first six months alone. I only keep burns which have a outer parity rate of no more than 4 errors per sector - my experience is that on rechecking their error rates periodically over 5+ years is that there is no tendency for them to worsen at all.

Re-reading the data and comparing it to the original or using a checksum is a sensible thing to do, but it really only gives you a very superficial idea of whether your burn is good, IMHO.
 
Solution
Thanks, Sean! I marked your reply as the answer.

Here is a StackOverflow discussion of whether MD5 is guaranteed unique. Bottom line answer is yes.

Is MD5 still good enough to uniquely identify files?

Regarding ECC (and CRC) errors, Brasero writes some errors to /var/log/syslog, or you can install the Linux xorriso program to examine disc error correction after burning.
Sean Nelson wrote:

But IMHO there's a bigger issue at stake when checking a DVD burn. You want to make sure that the burn itself is good. You can have a disk with a pretty marginal burn which still appears to read OK because the ECC codes on the disk are able to correct the errors on the fly. To guard against that you need to use a DVD burner which can report the raw error rates (before ECC correction) that the drive sees, and software (such as Nero DiscSpeed) which can show you what they are.

I've had poorly burned disks which appeared to read perfectly but which showed a ton of correctable ECC errors. I reburned those to better discs, but kept them around to see what would happen and found that they degraded quite considerably (the error rates went up by a factor of over 4X) over the first six months alone. I only keep burns which have a outer parity rate of no more than 4 errors per sector - my experience is that on rechecking their error rates periodically over 5+ years is that there is no tendency for them to worsen at all.

Re-reading the data and comparing it to the original or using a checksum is a sensible thing to do, but it really only gives you a very superficial idea of whether your burn is good, IMHO.
 
If (and this is a big if) the on the fly checksum is actually compared to the data actually on the disc, then all the data has to be read from the disc - meaning it will take exactly as long time as writing the disc first and then reading it back. There is no time gained by doing it on the fly. In fact, that will probably take longer since it is not possible to spool the data from the DVD when doing that.

Also, when doing the whole disc comparison the normal way to do that is - by checksum. So there is no difference in the actual algorithm used to make the comparison. Only in the way the data is read back.

Bottom line: if you want to make sure the data was written properly, the data will have to be read back and examined. The fastest way to do this is to burn then verify. Any other method which is faster is cheating and is not actually verifying the data on the disc.

Jesper
 
theswede wrote:

If (and this is a big if) the on the fly checksum is actually compared to the data actually on the disc, then all the data has to be read from the disc - meaning it will take exactly as long time as writing the disc first and then reading it back. There is no time gained by doing it on the fly.
The use of a checksum doesn't mean you don't re-read the data that was just burned. It's not really "verification" if you don't do that. It means that you re-read the data just burned, recompute the checksum from it, and compare the checksum to that originally generated (and saved in RAM) when the data was written to the drive. This can be done on a file-by-file basis or using a single checksum for the entire burn session.

The advantage of using the checksum is that you don't have to re-read the hard drive a second time to compare it against the burned data. That frees up the hard drive for other work. It also means that you don't get false verify errors because a data file changed after it was burned. Many backup programs use this technique for file verification, both on backup and restore.

If you have enough RAM then the hard drive file data will probably still be in the filesystem cache, but when you're burning DVDs and especially BluRay discs the amount of data can easily be more than the cache can hold, hence the advantage of using checksums.
 
Last edited:

Keyboard shortcuts

Back
Top