RAW storm in a teacup? Dave Coffin interviewed
If anyone understands the ins and outs of RAW, it's Dave Coffin, he has reverse engineered the RAW formats of almost every digital camera on the market and provides his code (dcraw.c) freely for anyone to use. He recently posted a note on his web page pointing out that the encryption of metadata (in the current Nikon vs. Adobe situation) is nothing new and that it's fairly common for manufacturers to apply some kind of protection to their RAW formats. We decided to ask him some of the questions this information raises and also those which have been asked by our readers.
From Dave's page (talking about metdata encryption):
A note about metadata encryption
A firestorm of controversy recently erupted when Thomas Knoll of Adobe accused Nikon of encrypting the white balance data in the D2X and D2Hs cameras, thus preventing Adobe from fully supporting these cameras.
I cracked this encryption on April 15, and updated dcraw.c and parse.c on April 17. So "dcraw -w" now works correctly with all Nikon cameras.
This is not a new problem. Phase One, Sony, Foveon, and Canon all apply some form of encryption to their RAW files. Dcraw decodes them all -- you can easily find decryption code by searching for the ^ operator.
Compression is not encryption. Phase One and Sony do encryption only. Kodak does compression only. Canon, Nikon, and Foveon compress the image data and encrypt some of the metadata.
dpreview.com interview with Dave Coffin
We decided to try and get a bit more background to this and conducted a brief interview with Dave to discuss his work and the encryption / obfuscation of RAW data. It seems clear to us that while it's a concern that manufacturers are making it harder for the RAW decoders, this isn't something new and certainly (not at the moment at least) nothing that can't be cracked.
1. Can you just give us a short history of dcraw and how it got started?
It started in February 1997, when I bought a Canon PowerShot 600. Decoding the RAW data was more difficult than I had expected, knowing nothing about filter arrays, colorspace conversion, etc. But in August 1997, I found a decent interpolation technique, and finally was able to create images comparable in quality to Canon's.
Word slowly spread, and people asked me to do other cameras, sending me sample images to decode. I added support for the PowerShot A5 in May 1999 and the PowerShot A50 and Pro70 in May 2000.
In late September 2001, after months of effort, I finally figured out the lossless compression algorithm used by the PowerShots Pro90, G1, G2, S30, S40, and EOS D30/D60 cameras.
I solved the Canon EOS-1D on Jan 28, 2002 and the Nikon compressed NEF format on March 24, 2002. Olympus ORF format is not compressed, so it's much easier to decode.
On November 19, 2002, I was laid off. During that month, I added nineteen Kodak cameras, the PowerShot G3/S45, the Canon EOS-1DS, the Fuji S2, and the Minolta DiMAGE 7. In early December, I replaced the whole color-interpolation system, yielding sharper images for all cameras.
On December 10, I attacked the Sigma SD9. I solved the compression algorithm on December 31, then spent another six weeks constructing a Foveon-specific interpolation routine to enhance color and reduce noise.
2. As we know none of the manufacturers openly document their RAW formats, how long does it typically take for you to reverse-engineer a format?
It can take minutes or months, depending on the complexity of the format.
3. Are you ever concerned about the legal implications of reverse-engineering proprietary file formats?
If anyone sued me, I'd be the biggest free software hero since Jon Johanson. It's better for the camera makers to ignore me and hope I lose interest.
4. I take it that reverse-engineering the metadata out of the RAW file is just as complicated (if not more so) than the actual sensor data itself, is this correct?
Yes, the metadata is much more complicated. That's why dcraw reads only metadata necessary to decode the image, and ignores the rest.
5. Which RAW format was the first you worked on which showed signs of having its metadata deliberately encrypted / obsfuscated? Can you give us examples of other formats which have been made 'hard to decode' by the manufacturers?
The Canon PowerShot G6, S60, S70, and Pro1 apply a trivial XOR to the metadata related to color balance. Phase One encrypts the entire image in a slightly more complicated way.
6. I understand that Sony's SRF file format is encrypted, does this include the actual RAW data or just metadata?
Both are encrypted with a hard cipher. My sony_clear program decrypts the entire SRF file.
7. Do you believe manufacturers are doing this to protect their own RAW converters or simply as a method of compressing the metadata?
Encryption is not compression. XOR'ing cleartext with a key does not change the size of a file -- it only makes the contents harder to read.
8. It's clear that many photographers are concerned over the current situation between Adobe and Nikon because they feel it may be an indicator of worse to come (harder encryption, more 'locking down' of file data). So is this a storm in a teacup or a sign of more to come?
Photographers have reason to feel scared. Not being computer hackers, they feel powerless to stop Nikon from asserting property rights over their images.
I'm not so worried. Whatever scheme Nikon tries next, I'll just reverse-engineer it.
9. Is there a place for a standard 'Open' RAW format or does that raise too many issues to do with the sharing of proprietary image processing between competitive manufacturers?
Adobe Digital Negative (DNG) is a great format -- I totally redesigned dcraw for maximum DNG compatibility. But you won't see much enthusiasm from the camera makers. This Joel essay explains why:
Photoshop and digital cameras are complements. Adobe wants to commoditize the digital camera, and the camera makers want to stop them.
10. Manufacturers (Canon for example) claim that only they know how to use the RAW data - along with their knowledge of the sensor's characteristics - to squeeze the best possible quality out of their cameras. Our tests indicate this often isn't the case, with 3-rd party converters often getting better results. Is there any advantage a manufacturer might have when producing a RAW converter?
Whatever advantage the manufacturer has, it disappears when a camera reaches the market. Then anyone is free to buy the camera, shoot test patterns, and analyze the RAW data.
11. Are you aware of significant differences in the way the various manufacturer's converters process RAW files, as we see huge differences in the quality of output, or are they all basically the same thing, just some better optimized?
I don't know -- I usually trace the manufacturer's code just far enough to extract the RAW data.
12. How raw is a RAW file, are there any formats in which the sensor data has actually been modified before it's recorded?
Some Nikon cameras have gaps or spikes in the raw histogram, indicating that the colors were multiplied before being saved to the RAW file. Most cameras leave the RAW data alone, and write color multipliers into the metadata.