Why are RAW files so big ?

... not lossless compression such as ZIP.
Well if
there's a way to do that without "losing" info, I'd love to see it.
Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
--
Luciano Oliveira
--
--------------------------------------------
Ante Vukorepa
 
First, Canon raw files are compressed in the camera.

Second, bzip2 (which is a fine file compressor) has almost no effect on their file sizes.

There is nothing to compress that isn't already compressed.

Get a clue.

-joseph
Well if
there's a way to do that without "losing" info, I'd love to see it.
Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
--
Luciano Oliveira
--
Canon.
 
... and thermal noise is random. Thermal noise (what you find in sensor images) doesn't come from a weak generator.

-joseph
There's not much that you can do to get these files any smaller.
You need a certain file size to store a certain amount of
information.
That is only true for random data. Any data that is not random
(true random, not that pseudo cr@p from a weak generator) can be
compressed somewhat, assume there is a smart guy to find a function
to describe such data. ;)
--
Canon.
 
If its 20% smaller.... are they really unnecessary??? Have you tried by "pushing" the exposures of hte original RAW file and DNG file to the extremes? I would think that if you lose 20% of the data... you might find that there will be more severe clipping or something of that sort when you really play around with a RAW file. Purely speculation of course.. but I would be worried.

Kiran
 
And Nikon's RAW files are even bigger.. aren't they??? And to make a compressed RAW files took lots of computing power.. no? I might be referring to an older model, but I could have sworn that I read that somewhere.

RAW files are a godsend. I have plently of files too.. and the trick is to figure out what you will never need and delete those.. as opposed to trying to compromise the quality of the good ones.
 
Personally, I would not. Adobe are clear that the format remains lossless. There are varying protocols of lossless compression. I have not seen any evidence at all to suggest that the DNG format results in any permanent loss of picture quality over the use of the huge and very varied number of RAW compression formats

Read their background information at

http://www.adobe.com/products/dng/main.html
 
Assuming DNG preserves all data from the original RAW file, then the only conceiveable explanations are that it is a more efficient file format (ie., store the exact same data using less bits), or the compression algorithm is better.

Again, assuming all data is preserved, it will not alter/clip your highlights in the least. You will have absolutely the same amount of data to work with.

-Rob

--
http://www.pinciuc.com/photos/
You see, the thing is, I'm an absolutist. Well, kind of... in a way...
 
In addition to all the others have said...

A pool of data (such as a RAW file) with a lot of repetitive data can be compressed more than one of randomness. Lossless compression algorithms find ways to identify redundancy and patterns and storing them only once. So a solid colour image could be highly compressed, while a photographic image with enormous amounts of subtle variation is more of a challenge. This is also why you get less shots on your CF card from high ISO images (they are compressed less): the noise makes the data more random and more difficult to find patterns and repetitiveness in, making the compression less efficient.

So it seems to me that Canon (and Nikon) is doing quite a good job of compressing their RAW files. If you want to see bad compression, check out the new Sony R1 - 20MB RAW files from a 10Mp sensor... yikes!

http://luminous-landscape.com/reviews/cameras/sony-r1.shtml

Cheers,
-Rob

--
http://www.pinciuc.com/photos/
You see, the thing is, I'm an absolutist. Well, kind of... in a way...
 
He was talking about find identical pixels and use this to compress the file. This one type of lossless compression. If two pixels next to each other are really identical, then you can for sure compress the file without loss.

--
Luciano Oliveira
 
From the Information Theory and several other related studies, we know that it's impossible to get the perfect compression routine. That's why you find different implementations of lossless compression. Each one is more adapted to a type of file, but no one is really perfect (it only approaches the perfection by a certain degree)

Also, any compression routine will always create larger files with some types of files. That's what happens when you try to zip the RAW files. :))

However, contrary to what you say, there some times when you can compress something that was already compressed (specially if the first compression was made with a less efficient routine).
Second, bzip2 (which is a fine file compressor) has almost no
effect on their file sizes.

There is nothing to compress that isn't already compressed.

Get a clue.

-joseph
Well if
there's a way to do that without "losing" info, I'd love to see it.
Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
--
Luciano Oliveira
--
Canon.
--
Luciano Oliveira
 
If its 20% smaller.... are they really unnecessary??? Have you
tried by "pushing" the exposures of hte original RAW file and DNG
file to the extremes? I would think that if you lose 20% of the
data... you might find that there will be more severe clipping or
something of that sort when you really play around with a RAW file.
Purely speculation of course.. but I would be worried.
DNG uses lossless JPEG compression. It is truly reversible, (for example, you can run it back through the DNG Converter with the "compress" option unset).

Test the following routes, to ensure that the DNG Converter itself is not screwing things up. (I test by using blending mode "difference" on the two versions, and verifying that all pixels are zero).
PEF > ACR > PSD (layer 0)
PEF > DNG > ACR > PSD (layer 1)

In all cases that I have tried, it works.

Canon appears to have better compression with some cameras than some other manufacturers. I've found that converting 350D raws to DNG might save only 15%, whereas with some cameras from other manufacturers the DNG is less than half the size.
 
Presumably when you said 8.5Mb you meant 8.5Mpixel which is 12.75Mbytes as each pixel is 12 bits of data in a RAW image. So Canon's compression is 12.75Mbytes down to 7.8Mbytes which is much more respectable for lossless compression. I'm sure they could do better with more processing power/more memory/more time for the processing but suspect that what we have is a pretty good balance. It makes sense that in a computer after the event with loads of all the above to work with, that the files can be reduced even further.

Mark
 
but why can it not be compressed further without loss of
information ? Is this ratio of pixel to byte fixed for a lossless
compression ?
This theory was developed by Claude Shannon in the 50s and explains
this such as compression (and lots of other really important
stuff).
Compression is one of my favorite topics. I love to design compression algorithms and play with them. Lossy and lossless. The "holy grail" of compression algorithms is the one which compresses your data into the minimum representation.

But there is a fundamental problem with any compression algorithm, and a professor once explained it to me very convincingly using the pigeon hole principle. Let me try and summarize the argument when talking about lossless compression.

You have X amount of photo data, say X is 10MB. You can always represent X amount of data without compressing it. So at worst, you have 1:1 compression (the compression algorithm and supporting information is metadata, and although it contributes to your file size, it is fixed in length and not significant).

Now, imagine that you want to store your X amount of photo data in Y space, where Y

In a JPEG, the program which compresses the image decides to ignore pieces of information in the image that it chooses to ignore in order to make the image significantly smaller. The information lost produces artifacts in the resulting image. The raw image formats do not discard this information, and that is why they take up in some cases significantly more space when recording the same information.

You can look at it like the raw images store more information, or you can look at it like the JPEG images throw away so much information. Both images are compressed quite well. For every MegaPixel of sensor values with 12 bit sensors, you have 1.5MB of unprocessed data, which will become 3MB after demosaicing. So if you have a 6MP DSLR, no compressions would mean you have 9MB raw files as output and 18MB JPEGs. When in actuality, the 9MB gets compressed down to about 4 or 5 depending on the content, and the 18MB gets compressed down to 3 or 4, but you lose information.

-Mike
http://demosaic.blogspot.com
 
Kind of odd, isnt it, that so many people object to the use of DNG in these threads, and so few seem keen to try it

I just cant fault it in practice now. The files are smaller, they work fine, and once converted, i can send the file to anyone who uses CS or CS 2 of any version, and they get exactly the same result as I do.

Personally, I am sold
 
Kind of odd, isnt it, that so many people object to the use of DNG
in these threads, and so few seem keen to try it
[snip]

Chuckle! Perhaps part of the problem is that people don't realise that the situation evolves month by month, and they are basing their decisions on old information. But, even so, an experiment would surely be worth it.

(I started to use DNG about 14 months ago. At first, I kept the original raws. For the last 6 months, I haven't been doing that. But I check each upgrade before committing to it).
 
He was talking about find identical pixels and use this to compress
the file. This one type of lossless compression. If two pixels next
to each other are really identical, then you can for sure compress
the file without loss.
"The camera "compresses" files by comparing pixels on a bevvy of criteria, and tossing ones that are similar or exact to others around them, thereby eliminating redundancy."

Note: similar.

See?

--
--------------------------------------------
Ante Vukorepa
 
This article was written by the guy who runs "Luminous Landscape" ... talk to him about what he was saying.. I was simply paraphrasing for those who don't subscribe to the magazine ;)

And no, I wasn't talking about ZIP files... everyone knows you can ZIP or even RAR files... but well can we do that in our cameras? Ya anyway...
He was talking about find identical pixels and use this to compress
the file. This one type of lossless compression. If two pixels next
to each other are really identical, then you can for sure compress
the file without loss.
"The camera "compresses" files by comparing pixels on a bevvy of
criteria, and tossing ones that are similar or exact to
others around them, thereby eliminating redundancy."

Note: similar.

See?

--
--------------------------------------------
Ante Vukorepa
--
http://www.meucciphotographic.com
 
But this works by removing unnecessary data such as white space characters. You can't compress text with a lossy system without turning it into gibberish.

You can losslessly compress image data simply by saying something like "the next 2000 pixels are all the same as this one" and then you don't have to store the redundant data for those 2000 pixels.

RAW files are sometimes bigger than they need to be because there is not even simple packing done - so every 12 bit chunck of data ends up in a 16bit space with 4 useless bits.
Well if
there's a way to do that without "losing" info, I'd love to see it.
Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
--
Luciano Oliveira
--
Galleries and website: http://www.whisperingcat.co.uk/mainindex.htm
 
And no, I wasn't talking about ZIP files... everyone knows you can
ZIP or even RAR files... but well can we do that in our cameras?
That's exactly (well, simplified, anyway) what Canon does with their RAW files. So, yes, we CAN do that and DO that in our cameras already.

--
--------------------------------------------
Ante Vukorepa
 

Keyboard shortcuts

Back
Top