Confusion about Floating Point in Color Space Transforms

Mandem

Member
Messages
31
Reaction score
2
My question is more from a Video/Color standpoint but since video is based on the principles of photography I figured perhaps someone could help(I've asked on the video forums as well)

To explain my confusion I'm going to use an example with a hypothetical sensor which clips at a max 1000% scene linear reflectance and crushes blacks at 0% light (I know there is no such thing as 0% light but it just makes the numbers easier to deal with).

I've attached below a hypothetical 10bit Log Transform and a 16bit Linear RAW graph of the Input of Linear Light and Code value output(The so called OETF). The output values are both in Code Value (and Normalized from 0-1)



242314bf56014dfd8069f99f9d361a37.jpg

Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
 
My question is more from a Video/Color standpoint but since video is based on the principles of photography I figured perhaps someone could help(I've asked on the video forums as well)

To explain my confusion I'm going to use an example with a hypothetical sensor which clips at a max 1000% scene linear reflectance and crushes blacks at 0% light (I know there is no such thing as 0% light but it just makes the numbers easier to deal with).

I've attached below a hypothetical 10bit Log Transform and a 16bit Linear RAW graph of the Input of Linear Light and Code value output(The so called OETF). The output values are both in Code Value (and Normalized from 0-1)

242314bf56014dfd8069f99f9d361a37.jpg

Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
In floating point representation, leading zeros in the significand are not allowed. Therefore precision of the significand is not reduced when the signal drops. In 16-bit unsigned integer representation, you lose percentage precision as the number gets smaller. Forgetting the leading zeros, you have 13 bits of precision when the signal drops below half scale, 12 bits when it falls below quarter scale, etc. With floating point, that doesn't happen.

--
 
Thanks for taking the time to answer. I am not too familiar with how Floating Point Operations actually work but I understood the gist of it in that they allow almost an infinite amount of values but I still don't understand how the Scene Linear Light gets allocated an output in a Log to Linear 32bit float transform. For my example above, will a 1000% reflectance still equal an outoput of 1? Or will it be 10 now(Since values of greater than 1 are allowed and if this is the case does the graph end there)?
 
Thanks for taking the time to answer. I am not too familiar with how Floating Point Operations actually work but I understood the gist of it in that they allow almost an infinite amount of values but I still don't understand how the Scene Linear Light gets allocated an output in a Log to Linear 32bit float transform. For my example above, will a 1000% reflectance still equal an outoput of 1? Or will it be 10 now(Since values of greater than 1 are allowed and if this is the case does the graph end there)?
That is a matter of the implementation. I usually set image full scale equal to one when using floating point, but allow values over one and negative values to persist in intermediate calculations. But not everybody does.
 
32-bit floating point will have vaguely the equivalent precision of a 24-bit binary number. So if you are going from a 32-bit binary integer with 4+ billion possible states to a 24-bit representation with about 16+ million possible states, then you are losing data. it would be lossy. Now as a practical matter the loss is somewhat limited to one part per 16 million, which is pretty small. But if your goal is a lossless workflow, single precision would not be good enough.


-- Bob
http://bob-o-rama.smugmug.com -- Photos
http://www.vimeo.com/boborama/videos -- Videos
 
... I'm going to use an example with a hypothetical sensor which clips at a max 1000% scene linear reflectance and crushes blacks at 0% light ...
Full scale in ideal photography is the highest intensity one wishes to retain detail in (achieved by an Exposure strategy usually referred to as ETTR), independently of object reflectance.

As for reflectance there are different types: diffuse, specular and mixed diffuse/specular. When photographers talk about reflectance they are typically referring to diffuse reflectance, with 100% being the most intense amount of diffuse light possible in a natural scene under the given illuminant (e.g. soda or teflon), L* of 100 (non linearly in Lab notation). Anything higher than that is due to mixed or specular reflections. BTW in that scale middle gray is considered to be L* 50 (or 18% linearly, 2.5 stops down from 100%).

However in photography there are many occasions where we want to also capture mixed reflectance, which lives above 100%, for instance detail in clouds or snowy fields. That's why Ansel Adams put middle gray in Zone V and clipping a full 4 stops above that. And why most in-camera automatic metering places middle gray at around 8-12% of clipping in the raw data, -3.5 stops below full scale: to leave room for presumably present and desired mixed reflections.

Once you have captured all the detail you wished linearly in the raw file you next proceed to process it for rendering and compression. It is best to perform as much of the processing as possible in the linear domain and compress it last. A value of one will represent your chosen clipping point, zero no light. Assuming you ETTR'd, you achieve that by dividing your data by the full scale value, in floating point notation. If you didn't ETTR, consider full scale the value of the brightest object you would like to retain detail in.

You may then want to apply curves to squeeze the captured DR into a smaller contrast ratio for compression and/or pleasing display. That will typically brighten middle gray at the expense of detail in the highlights. You may also want to change the black level accordingly. The result will still be data from zero to one, those extremes frozen by your earlier choices of clipping point and black level.

Finally you apply gamma (or log compression) on the 0->1 data and then convert it to the final bit depth of choice by multiplying it by the desired full scale value (for instance 10-bits, where a value of 1 would correspond to 1023DN).

Jack
 
Last edited:
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
 
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
There are ways to do that in a computationally efficient manner:

 
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
There are ways to do that in a computationally efficient manner:

https://patents.google.com/patent/US5774112A/en?inventor=james+kasson&oq=james+kasson
Nice to see that patent of yours. I glanced over it, and it looks like the guess I took in my last sentence is the right one. You say there that the right approach would be to convert to something like CIELAB, apply the curve there to the luminance channel only; then back. Then you say that this was expensive. Well, this is 25 years later - is it still expensive?

As a side note - I was more interested in S curves, etc. About the gamma one - it gets undone at some step, I believe before it gets to light intensity of each pixel on our monitors. If this is done channel by channel, then the gamma encoding should be done in the same way?
 
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
Many ways to skin that cat, though chromaticity shifts are pretty well guaranteed any time a non-linear curve is applied to linear data. Some Tone Reproduction Operators are better than others. Aside from Jim's method, one of the better approaches available to the rest of us is used by Anders Torger in his DcamProf camera profiler.

Jack
 
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
There are ways to do that in a computationally efficient manner:

https://patents.google.com/patent/US5774112A/en?inventor=james+kasson&oq=james+kasson
Nice to see that patent of yours.
It was kind of a cheap shot; I just turned math into hardware. At the time, IBM was patenting a lot of stuff having to do with image processing. So I played along.
I glanced over it, and it looks like the guess I took in my last sentence is the right one. You say there that the right approach would be to convert to something like CIELAB, apply the curve there to the luminance channel only; then back. Then you say that this was expensive. Well, this is 25 years later - is it still expensive?
The relative computational expense hasn't changed much. The absolute expense sure has.
As a side note - I was more interested in S curves, etc. About the gamma one - it gets undone at some step, I believe before it gets to light intensity of each pixel on our monitors. If this is done channel by channel, then the gamma encoding should be done in the same way?
I don't think so. In that model, each channel is gamma corrected individually, and un-corrected individually. In an old-fashioned CRT, the un-correction was done in the tube itself, and the conversion to RGB took place in the phosphors after that.

--
https://blog.kasson.com
 
Last edited:
32-bit floating point will have vaguely the equivalent precision of a 24-bit binary number. So if you are going from a 32-bit binary integer with 4+ billion possible states to a 24-bit representation with about 16+ million possible states, then you are losing data. it would be lossy. Now as a practical matter the loss is somewhat limited to one part per 16 million, which is pretty small. But if your goal is a lossless workflow, single precision would not be good enough.
Lazy fellow that I am, I use double precision floating point for all image processing unless there are performance reasons to do otherwise.
 
Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
I'm not the expert you have in other responders, and I don't know squat about Davinci Resolv, but in my rather myopic world (ICC) color space transforms go through a so-called 'Profile Connection Space', or PCS, which is usually linear XYZ. This is so the input matrix doesn't have to know about the output matrix. I speculate that's what the 'linearized' part of your question is about.

The floating point part has been well-covered in previous posts. I will say that it's not a 'squeezing', as you called it, it's more a 'scaling'. There are plenty of distinct values between 0.0 and 1.0 to encode the integers we usually use out of the camera, even with simple 32-bit float representation. I don't know this for a fact, but I think using 0.0 - 1.0 avoids having a discontinuities in the resolution of elemental values as would happen when moving from one exponent to the next. For a user interface, 0.0 - 1.0 floating point actually provides an intuitive representation of values in "percentage" terms in the progression from black to white.

Jim, Jack, tear me apart... :D
 
BTW, how do you apply a curve? On each RGB linear channel? If so, this brings the following question: If we perceive (R,G,B) with some specific values as a certain color, would we perceive (f(R), f(G), f(B)) as the same color for a ‘reasonable’ transformation f, but with a different lightness? Or maybe we have to switch to another space first, like XYZ or so, and apply the curve to the luminosity channel only?
The gamma in the HLG OOTF is applied to luminance, indeed to avoid hue and saturation shifts. See BT.2390-9 page 26:
Simply applying a gamma curve to each component separately as is done for SDR television distorts the colour; in particular, it distorts saturation but also to a lesser extent the hue.

[…]

Instead of the current SDR practice of applying a gamma curve independently to each colour component, for HDR it should be applied to the luminance alone.
With HLG, the ratio of Y^gamma / Y, i.e. Y^(gamma − 1), is thus applied to the linear R, G, and B values. Since the same coefficient is applied, their relative ratios are preserved, and therefore so is the chromaticity.

Page 31:
The HLG OOTF (system gamma applied on luminance) uses scene-referred camera signals that result in a display that closely preserves the chromaticity of the scene as imaged by the camera. This differs from the traditional colour reproduction provided by the HDTV and UHDTV OOTFs, which produce more saturated colours which viewers of existing SDR content have become familiar with. Should such a traditional colour reproduction be desired, a gamma of 1.2 could be applied on the RGB components of a camera signal to produce more saturated colours.
The OETF is applied per-channel, but that is not considered to be a problem given that it is undone before applying the OOTF (as is nicely explained in this recent video by a developer of HLG, around 7:50).
 
Last edited:
Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
I'm not the expert you have in other responders, and I don't know squat about Davinci Resolv, but in my rather myopic world (ICC) color space transforms go through a so-called 'Profile Connection Space', or PCS, which is usually linear XYZ. This is so the input matrix doesn't have to know about the output matrix. I speculate that's what the 'linearized' part of your question is about.
Not sure what is actually done by what color engines, but populating a 3D LUT with inputs in the source space and outputs in the destination space for the actual image transform is more efficient than doing two transforms of the image. The source to reference transform and the reference to destination transform can be used to populate the LUT the actual image goes through.
 
Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
I'm not the expert you have in other responders, and I don't know squat about Davinci Resolv, but in my rather myopic world (ICC) color space transforms go through a so-called 'Profile Connection Space', or PCS, which is usually linear XYZ. This is so the input matrix doesn't have to know about the output matrix. I speculate that's what the 'linearized' part of your question is about.
Not sure what is actually done by what color engines, but populating a 3D LUT with inputs in the source space and outputs in the destination space for the actual image transform is more efficient than doing two transforms of the image. The source to reference transform and the reference to destination transform can be used to populate the LUT the actual image goes through.
It is definitely more computationally efficient to do "VFR-direct" input-output transforms. But, there's more flexibility in using a PCS to connect profiles that don't have to know about all the others; each time a new colorspace is supported, the developer needs to generate transforms for all the previously available input/output transforms.

Addendum: In my home-cooked raw processor, I do a display transform every time a tool is changed. Using the LittleCMS color management library and a float -> PCS -> integer transform, that used to be dead slow; I usually would add my pre-export resize in the toolchain to mitigate it. LittleCMS recently implemented so-called "fast-float" transform operations, and now they're speedy-quick. I love LittleCMS... :D
 
Last edited:
Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
I'm not the expert you have in other responders, and I don't know squat about Davinci Resolv, but in my rather myopic world (ICC) color space transforms go through a so-called 'Profile Connection Space', or PCS, which is usually linear XYZ. This is so the input matrix doesn't have to know about the output matrix. I speculate that's what the 'linearized' part of your question is about.
Not sure what is actually done by what color engines, but populating a 3D LUT with inputs in the source space and outputs in the destination space for the actual image transform is more efficient than doing two transforms of the image. The source to reference transform and the reference to destination transform can be used to populate the LUT the actual image goes through.
It is definitely more computationally efficient to do "VFR-direct" input-output transforms. But, there's more flexibility in using a PCS to connect profiles that don't have to know about all the others; each time a new colorspace is supported, the developer needs to generate transforms for all the previously available input/output transforms.
That is not what I’m suggesting. Please reread the last sentence of my post.
Addendum: In my home-cooked raw processor, I do a display transform every time a tool is changed. Using the LittleCMS color management library and a float -> PCS -> integer transform, that used to be dead slow; I usually would add my pre-export resize in the toolchain to mitigate it. LittleCMS recently implemented so-called "fast-float" transform operations, and now they're speedy-quick. I love LittleCMS... :D
 
Now, when the 10bit Log encoded image goes through a Color Space Transform the first thing that happens to it is that it gets "Linearized" to a 32bit Floating Point(this is Davinci Resolve) before any other thing happens to it. Now, I'm not a software developer or video engineer or anything of that sort but having read about Floating Point(which is much more confusing than integer values) it's been said that Floating Point allows for values greater than 1 and less than 0. The problem I'm having is understanding how does this relate to Scene Linear Reflectance getting matched to the output in Linear Floating Point? For instance is 100% light reflectance giving an output of 1(the highest possible value in integers)? Does a 500% reflectance give an output of 5?What about 1000% reflectance, would that be an output of 10 and would the graph end there?? Or am I completely way off and 1000% reflectance would still be an output of 1 just that between 0 and 1 there'd be approximately 4 billion values worth of data(then what's the point of having values greater than 1 and less than 0?)

The thing that's tricking me is that the entire dynamic range of the camera gets squeezed into a 0-1 range in the integer encoded camera files, but I can't understand what's happening with this Linearized 32bit Floating Point. How is this 32bit Floating Point any different than the Linear RAW(Other than it being 32 bit rather than 16 bit?). What's the significance of having values greater than 1 and less than 0? Just very lost on this. Thanks
I'm not the expert you have in other responders, and I don't know squat about Davinci Resolv, but in my rather myopic world (ICC) color space transforms go through a so-called 'Profile Connection Space', or PCS, which is usually linear XYZ. This is so the input matrix doesn't have to know about the output matrix. I speculate that's what the 'linearized' part of your question is about.
Not sure what is actually done by what color engines, but populating a 3D LUT with inputs in the source space and outputs in the destination space for the actual image transform is more efficient than doing two transforms of the image. The source to reference transform and the reference to destination transform can be used to populate the LUT the actual image goes through.
It is definitely more computationally efficient to do "VFR-direct" input-output transforms. But, there's more flexibility in using a PCS to connect profiles that don't have to know about all the others; each time a new colorspace is supported, the developer needs to generate transforms for all the previously available input/output transforms.
That is not what I’m suggesting. Please reread the last sentence of my post.
Ah, I get it now. My apologies for cluttering the thread just to straighten out my mis-read.
Addendum: In my home-cooked raw processor, I do a display transform every time a tool is changed. Using the LittleCMS color management library and a float -> PCS -> integer transform, that used to be dead slow; I usually would add my pre-export resize in the toolchain to mitigate it. LittleCMS recently implemented so-called "fast-float" transform operations, and now they're speedy-quick. I love LittleCMS... :D
 

Keyboard shortcuts

Back
Top