Why are RAW files so big ?

Ante Vukorepa · Dec 13, 2005

... not lossless compression such as ZIP.

Luciano Oliveira said:
Todd Meucci said:

Well if
there's a way to do that without "losing" info, I'd love to see it.

Click to expand...

Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression

Luciano Oliveira said:

I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...

Ray Chen said:

ioannis said:

I know it has to be a lossless compression

Click to expand...

Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com

Click to expand...

--
http://www.meucciphotographic.com

Click to expand...

--
Luciano Oliveira

--
--------------------------------------------
Ante Vukorepa

joebloe · Dec 13, 2005

First, Canon raw files are compressed in the camera.

Second, bzip2 (which is a fine file compressor) has almost no effect on their file sizes.

There is nothing to compress that isn't already compressed.

Get a clue.

-joseph

Luciano Oliveira said:
Todd Meucci said:

Well if
there's a way to do that without "losing" info, I'd love to see it.

Click to expand...

Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression

Luciano Oliveira said:

I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...

Ray Chen said:

ioannis said:

I know it has to be a lossless compression

Click to expand...

Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com

Click to expand...

--
http://www.meucciphotographic.com

Click to expand...

--
Luciano Oliveira

--
Canon.

joebloe · Dec 13, 2005

... and thermal noise is random. Thermal noise (what you find in sensor images) doesn't come from a weak generator.

-joseph

Ray Chen said:
Ray Chen said:

There's not much that you can do to get these files any smaller.
You need a certain file size to store a certain amount of
information.

Click to expand...

That is only true for random data. Any data that is not random
(true random, not that pseudo cr@p from a weak generator) can be
compressed somewhat, assume there is a smart guy to find a function
to describe such data.

--
Canon.

Kiran P · Dec 13, 2005

If its 20% smaller.... are they really unnecessary??? Have you tried by "pushing" the exposures of hte original RAW file and DNG file to the extremes? I would think that if you lose 20% of the data... you might find that there will be more severe clipping or something of that sort when you really play around with a RAW file. Purely speculation of course.. but I would be worried.

Kiran

Kiran P · Dec 13, 2005

And Nikon's RAW files are even bigger.. aren't they??? And to make a compressed RAW files took lots of computing power.. no? I might be referring to an older model, but I could have sworn that I read that somewhere.

RAW files are a godsend. I have plently of files too.. and the trick is to figure out what you will never need and delete those.. as opposed to trying to compromise the quality of the good ones.

Paul JM · Dec 13, 2005

Personally, I would not. Adobe are clear that the format remains lossless. There are varying protocols of lossless compression. I have not seen any evidence at all to suggest that the DNG format results in any permanent loss of picture quality over the use of the huge and very varied number of RAW compression formats

Read their background information at

http://www.adobe.com/products/dng/main.html

Rob P · Dec 13, 2005

Assuming DNG preserves all data from the original RAW file, then the only conceiveable explanations are that it is a more efficient file format (ie., store the exact same data using less bits), or the compression algorithm is better.

Again, assuming all data is preserved, it will not alter/clip your highlights in the least. You will have absolutely the same amount of data to work with.

-Rob

--
http://www.pinciuc.com/photos/
You see, the thing is, I'm an absolutist. Well, kind of... in a way...

dchao · Dec 13, 2005

and save 15% to 20% on file size

Since DNG is compatible with TIFF, you can see the thumbnails from within Windows Explorer. Very handy indeed.

If you are interested, here is an aritcle on the DNG/NEF thumbnail bug in WinXP and how to fix it:
http://www.earthboundlight.com/phototips/nef-windows-xp-sp2.html

Rob P · Dec 13, 2005

In addition to all the others have said...

A pool of data (such as a RAW file) with a lot of repetitive data can be compressed more than one of randomness. Lossless compression algorithms find ways to identify redundancy and patterns and storing them only once. So a solid colour image could be highly compressed, while a photographic image with enormous amounts of subtle variation is more of a challenge. This is also why you get less shots on your CF card from high ISO images (they are compressed less): the noise makes the data more random and more difficult to find patterns and repetitiveness in, making the compression less efficient.

So it seems to me that Canon (and Nikon) is doing quite a good job of compressing their RAW files. If you want to see bad compression, check out the new Sony R1 - 20MB RAW files from a 10Mp sensor... yikes!

http://luminous-landscape.com/reviews/cameras/sony-r1.shtml

Cheers,
-Rob

--
http://www.pinciuc.com/photos/
You see, the thing is, I'm an absolutist. Well, kind of... in a way...

Luciano Oliveira · Dec 14, 2005

He was talking about find identical pixels and use this to compress the file. This one type of lossless compression. If two pixels next to each other are really identical, then you can for sure compress the file without loss.

--
Luciano Oliveira

Luciano Oliveira · Dec 14, 2005

From the Information Theory and several other related studies, we know that it's impossible to get the perfect compression routine. That's why you find different implementations of lossless compression. Each one is more adapted to a type of file, but no one is really perfect (it only approaches the perfection by a certain degree)

Also, any compression routine will always create larger files with some types of files. That's what happens when you try to zip the RAW files.

)

However, contrary to what you say, there some times when you can compress something that was already compressed (specially if the first compression was made with a less efficient routine).

joebloe said:
Second, bzip2 (which is a fine file compressor) has almost no
effect on their file sizes.

There is nothing to compress that isn't already compressed.

Get a clue.

-joseph

Luciano Oliveira said:

Todd Meucci said:

Well if
there's a way to do that without "losing" info, I'd love to see it.

Click to expand...

Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression

joebloe said:

I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...

Ray Chen said:

ioannis said:

I know it has to be a lossless compression

Click to expand...

Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com

Click to expand...

--
http://www.meucciphotographic.com

Click to expand...

--
Luciano Oliveira

Click to expand...

--
Canon.

--
Luciano Oliveira

Barry Pearson · Dec 14, 2005

Kiran P said:
If its 20% smaller.... are they really unnecessary??? Have you
tried by "pushing" the exposures of hte original RAW file and DNG
file to the extremes? I would think that if you lose 20% of the
data... you might find that there will be more severe clipping or
something of that sort when you really play around with a RAW file.
Purely speculation of course.. but I would be worried.

DNG uses lossless JPEG compression. It is truly reversible, (for example, you can run it back through the DNG Converter with the "compress" option unset).

Test the following routes, to ensure that the DNG Converter itself is not screwing things up. (I test by using blending mode "difference" on the two versions, and verifying that all pixels are zero).
PEF > ACR > PSD (layer 0)
PEF > DNG > ACR > PSD (layer 1)

In all cases that I have tried, it works.

Canon appears to have better compression with some cameras than some other manufacturers. I've found that converting 350D raws to DNG might save only 15%, whereas with some cameras from other manufacturers the DNG is less than half the size.

Mark Stonton · Dec 14, 2005

Presumably when you said 8.5Mb you meant 8.5Mpixel which is 12.75Mbytes as each pixel is 12 bits of data in a RAW image. So Canon's compression is 12.75Mbytes down to 7.8Mbytes which is much more respectable for lossless compression. I'm sure they could do better with more processing power/more memory/more time for the processing but suspect that what we have is a pretty good balance. It makes sense that in a computer after the event with loads of all the above to work with, that the files can be reduced even further.

Mark

Mike Fried · Dec 14, 2005

ioannis said:
Luciano Oliveira said:

but why can it not be compressed further without loss of
information ? Is this ratio of pixel to byte fixed for a lossless
compression ?

Click to expand...

Luciano Oliveira said:
This theory was developed by Claude Shannon in the 50s and explains
this such as compression (and lots of other really important
stuff).

Compression is one of my favorite topics. I love to design compression algorithms and play with them. Lossy and lossless. The "holy grail" of compression algorithms is the one which compresses your data into the minimum representation.

But there is a fundamental problem with any compression algorithm, and a professor once explained it to me very convincingly using the pigeon hole principle. Let me try and summarize the argument when talking about lossless compression.

You have X amount of photo data, say X is 10MB. You can always represent X amount of data without compressing it. So at worst, you have 1:1 compression (the compression algorithm and supporting information is metadata, and although it contributes to your file size, it is fixed in length and not significant).

Now, imagine that you want to store your X amount of photo data in Y space, where Y

In a JPEG, the program which compresses the image decides to ignore pieces of information in the image that it chooses to ignore in order to make the image significantly smaller. The information lost produces artifacts in the resulting image. The raw image formats do not discard this information, and that is why they take up in some cases significantly more space when recording the same information.

You can look at it like the raw images store more information, or you can look at it like the JPEG images throw away so much information. Both images are compressed quite well. For every MegaPixel of sensor values with 12 bit sensors, you have 1.5MB of unprocessed data, which will become 3MB after demosaicing. So if you have a 6MP DSLR, no compressions would mean you have 9MB raw files as output and 18MB JPEGs. When in actuality, the 9MB gets compressed down to about 4 or 5 depending on the content, and the 18MB gets compressed down to 3 or 4, but you lose information.

-Mike
http://demosaic.blogspot.com

Paul JM · Dec 14, 2005

Kind of odd, isnt it, that so many people object to the use of DNG in these threads, and so few seem keen to try it

I just cant fault it in practice now. The files are smaller, they work fine, and once converted, i can send the file to anyone who uses CS or CS 2 of any version, and they get exactly the same result as I do.

Personally, I am sold

Barry Pearson · Dec 14, 2005

Paul JM said:
Kind of odd, isnt it, that so many people object to the use of DNG
in these threads, and so few seem keen to try it

[snip]

Chuckle! Perhaps part of the problem is that people don't realise that the situation evolves month by month, and they are basing their decisions on old information. But, even so, an experiment would surely be worth it.

(I started to use DNG about 14 months ago. At first, I kept the original raws. For the last 6 months, I haven't been doing that. But I check each upgrade before committing to it).

Ante Vukorepa · Dec 14, 2005

Luciano Oliveira said:
He was talking about find identical pixels and use this to compress
the file. This one type of lossless compression. If two pixels next
to each other are really identical, then you can for sure compress
the file without loss.

"The camera "compresses" files by comparing pixels on a bevvy of criteria, and tossing ones that are similar or exact to others around them, thereby eliminating redundancy."

Note: similar.

See?

--
--------------------------------------------
Ante Vukorepa

Todd Meucci · Dec 14, 2005

This article was written by the guy who runs "Luminous Landscape" ... talk to him about what he was saying.. I was simply paraphrasing for those who don't subscribe to the magazine

And no, I wasn't talking about ZIP files... everyone knows you can ZIP or even RAR files... but well can we do that in our cameras? Ya anyway...

Ante Vukorepa said:
Luciano Oliveira said:

He was talking about find identical pixels and use this to compress
the file. This one type of lossless compression. If two pixels next
to each other are really identical, then you can for sure compress
the file without loss.

Click to expand...

"The camera "compresses" files by comparing pixels on a bevvy of
criteria, and tossing ones that are similar or exact to
others around them, thereby eliminating redundancy."

Note: similar.

See?

--
--------------------------------------------
Ante Vukorepa

--
http://www.meucciphotographic.com

DavidMillier · Dec 14, 2005

But this works by removing unnecessary data such as white space characters. You can't compress text with a lossy system without turning it into gibberish.

You can losslessly compress image data simply by saying something like "the next 2000 pixels are all the same as this one" and then you don't have to store the redundant data for those 2000 pixels.

RAW files are sometimes bigger than they need to be because there is not even simple packing done - so every 12 bit chunck of data ends up in a 16bit space with 4 useless bits.

Luciano Oliveira said:
Todd Meucci said:

Well if
there's a way to do that without "losing" info, I'd love to see it.

Click to expand...

Have you ever used a zip file? Doesn't it compress your text files?
Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without
any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression

Luciano Oliveira said:

I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...

Ray Chen said:

ioannis said:

I know it has to be a lossless compression

Click to expand...

Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com

Click to expand...

--
http://www.meucciphotographic.com

Click to expand...

--
Luciano Oliveira

--
Galleries and website: http://www.whisperingcat.co.uk/mainindex.htm

Ante Vukorepa · Dec 14, 2005

Todd Meucci said:
And no, I wasn't talking about ZIP files... everyone knows you can
ZIP or even RAR files... but well can we do that in our cameras?

That's exactly (well, simplified, anyway) what Canon does with their RAW files. So, yes, we CAN do that and DO that in our cameras already.

--
--------------------------------------------
Ante Vukorepa

Why are RAW files so big ?

Leading Member

Senior Member

Senior Member

Senior Member

Senior Member

Leading Member

Forum Enthusiast

Veteran Member

Forum Enthusiast

Leading Member

Leading Member

Veteran Member

Member

Senior Member

Leading Member

Veteran Member

Leading Member

Senior Member

Forum Pro

Leading Member

Keyboard shortcuts