Why are RAW files so big ?

ioannis

Veteran Member
Messages
6,618
Reaction score
1
Location
Baltimore, US
I know it has to be a lossless compression but I would like to hear more about the limitations that RAW compression dictates and if you know if anyone (and who) is trying to create new algorithms that would solve this issue.

Thanks,
Yiannis
--

If you don't eat yer meat, you can't have any pudding. How can you have any pudding if you don't eat yer meat?
 
... by definition it carries all the information of the sensor. So, lots of pixels to be defined at 12 bits gives lots and lots of bytes...

--
Luciano Oliveira
 
... by definition it carries all the information of the sensor. So,
lots of pixels to be defined at 12 bits gives lots and lots of
bytes...
but why can it not be compressed further without loss of information ? Is this ratio of pixel to byte fixed for a lossless compression ?
--
Luciano Oliveira
--
Yiannis
--

If you don't eat yer meat, you can't have any pudding. How can you have any pudding if you don't eat yer meat?
 
In the latest copy of American Photo Mag they have a RAW vs Jpeg article that is extremely informative. I'm by no means an expert, but as I understand it, the term compression is a bit of a misnomer. The camera "compresses" files by comparing pixels on a bevvy of criteria, and tossing ones that are similar or exact to others around them, thereby eliminating redundancy. Well if there's a way to do that without "losing" info, I'd love to see it. I guess RAWs are so large because they don't toss any redundant info. It may be redundant by the individual paramaters set by Canon, but in the finished product, well we don't need yet another RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
 
I know it has to be a lossless compression but I would like to hear
more about the limitations that RAW compression dictates and if you
know if anyone (and who) is trying to create new algorithms that
would solve this issue.
Actually, they are quite small.

Example: Canon 1D Mark II.

If you take a look at an uncompressed image that is generated from a RAW file, you will see: A frame has a total of 3504*2336 pixel. Every pixel has a bit depth of 16 bit (== 2Byte) for the red, green and blue channel. That is, to store that uncompressed image you need 3504*2336*3*2 Byte. That is a total of 47MByte! The RAW file only is about 7.5MByte.

My naive example does not consider that the real sensor does not have three light sensitive spots for the r, g and b channel of every pixel but calculates every pixel's R/G/B value from several spots. Also, the sensitivity of a sensor does not provide 16 bit but only 12 bit. Therefore, you need less space to store a RAW file than an uncompressed image.

There's not much that you can do to get these files any smaller. You need a certain file size to store a certain amount of information.
 
compared to TIFF - how about 24Mb TIFFs produced by the 8 megapixel Olympus 8080?
--
Misha
 
There's not much that you can do to get these files any smaller.
You need a certain file size to store a certain amount of
information.
That is only true for random data. Any data that is not random (true random, not that pseudo cr@p from a weak generator) can be compressed somewhat, assume there is a smart guy to find a function to describe such data. ;)

--
Ray Chen
http://www.arrayphoto.com
 
... by definition it carries all the information of the sensor. So,
lots of pixels to be defined at 12 bits gives lots and lots of
bytes...
but why can it not be compressed further without loss of
information ? Is this ratio of pixel to byte fixed for a lossless
compression ?
This theory was developed by Claude Shannon in the 50s and explains this such as compression (and lots of other really important stuff). You can find a lot of information in these links from Wikipedia:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Information_entropy

http://en.wikipedia.org/wiki/Data_compression

http://en.wikipedia.org/wiki/Lossless_data_compression
--
Luciano Oliveira
--
Yiannis
--
If you don't eat yer meat, you can't have any pudding. How can you
have any pudding if you don't eat yer meat?
--
Luciano Oliveira
 
That is only true for random data. Any data that is not random
(true random, not that pseudo cr@p from a weak generator) can be
compressed somewhat, assume there is a smart guy to find a function
to describe such data. ;)
You're right. I wouldn't consider a pixel image as being sampled from random... However, I'd like Canon to focus on improving things like dynamic range instead of 8.5MByte files going down to 7.8MByte files.
 
Well if
there's a way to do that without "losing" info, I'd love to see it.
Have you ever used a zip file? Doesn't it compress your text files? Are any errors produced in the process?

Yes, you can really compress any real data and retrive it without any error! Only perfectly random data cannot be compressed.

Check this:

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_compression
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
I know it has to be a lossless compression
Why? Although it is desirable, there is no law says that it has to
be a lossless compression.

--
Ray Chen
http://www.arrayphoto.com
--
http://www.meucciphotographic.com
--
Luciano Oliveira
 
I guess RAWs are so large because they don't toss any redundant
info. It may be redundant by the individual paramaters set by
Canon, but in the finished product, well we don't need yet another
RAW vs. (insert your file format here) argument...
Actually, you can consider a RAW file to be some form of lossy compression. There's even different algorithms to reproduce the original image. One is called Canon Digital Prefessional, another Adobe Camera RAW, ...
 
Because a particular amount of information needs a particular
amount of storage (if it is supposed to be lossless).
The information is only particular size if and only if it is totally random.

Let me give you the simplest case, a 128 mp cameras that can only take two types of pictures, a textureless white wall or black wall. Do you think I need a lot of data to describe one of the two images?

--
Ray Chen
http://www.arrayphoto.com
 
Using the 5D, even with a half terabyte drive, you chew up space pretty quickly if you shoot RAW

I now convert all my RAW files to DNG. The files are 20% smaller and I see no current problems with the files

Give it a try
 
It is quite conceivable that a lossy jpeg type compression would be applied to RAW files, shrinking there size optionally down to next to nothing. With a lossless compression, the major issue is simply the amount of the actual info in the picture and the noise. I would not hold my breath for much improvement there, but on the lossy side, things could get interesting.
I know it has to be a lossless compression but I would like to hear
more about the limitations that RAW compression dictates and if you
know if anyone (and who) is trying to create new algorithms that
would solve this issue.

Thanks,
Yiannis
--
If you don't eat yer meat, you can't have any pudding. How can you
have any pudding if you don't eat yer meat?
 
The information is only particular size if and only if it is
totally random.

Let me give you the simplest case, a 128 mp cameras that can only
take two types of pictures, a textureless white wall or black wall.
Do you think I need a lot of data to describe one of the two images?
I don't read this as a contradiction of my statement. The particular amount of information in your example is whether it is a white wall or a black wall. And you need a particular amount of storage for this information (that is, one bit). If you take less information (that is, 1-1=0 bit ;-), you wouldn't be able to store which type of wall it was without loss of information. Ok, I admit this example is kind of contrived but I guess you get the point*. Information theory is a bit hard to explain without getting technical, especially to people with a non-mathematical background.

) It is left to the reader to image that the imaginary sensor distinguished four distinct types of wall to have a more realistic example.
 
I now convert all my RAW files to DNG. The files are 20% smaller
and I see no current problems with the files
do the raw convertors open .dng in the same way as raw?
 
(nt)
 

Keyboard shortcuts

Back
Top