Re: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type
On 2025-06-03 06:51, Pekka Paalanen wrote: > On Tue, 3 Jun 2025 08:30:23 + > "Shankar, Uma" wrote: > >>> -Original Message- >>> From: Pekka Paalanen >>> Sent: Friday, May 30, 2025 7:28 PM >>> To: Shankar, Uma >>> Cc: Simon Ser ; Harry Wentland >>> ; Alex Hung ; dri- >>> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; intel- >>> g...@lists.freedesktop.org; wayland-devel@lists.freedesktop.org; >>> leo@amd.com; ville.syrj...@linux.intel.com; >>> pekka.paala...@collabora.com; >>> m...@igalia.com; jad...@redhat.com; sebastian.w...@redhat.com; >>> shashank.sha...@amd.com; ago...@nvidia.com; jos...@froggi.es; >>> mdaen...@redhat.com; aleix...@kde.org; xaver.h...@gmail.com; >>> victo...@system76.com; dan...@ffwll.ch; quic_nas...@quicinc.com; >>> quic_cbr...@quicinc.com; quic_abhin...@quicinc.com; mar...@marcan.st; >>> liviu.du...@arm.com; sashamcint...@google.com; Borah, Chaitanya Kumar >>> ; louis.chau...@bootlin.com >>> Subject: Re: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type >>> >>> On Thu, 22 May 2025 11:33:00 + >>> "Shankar, Uma" wrote: >>> One request though: Can we enhance the lut samples from existing 16bits to 32bits as lut precision is going to be more than 16 in certain hardware. >>> While adding the new UAPI, lets extend this to 32 to make it future proof. Reference: https://patchwork.freedesktop.org/patch/642592/?series=129811&rev=4 +/** + * struct drm_color_lut_32 - Represents high precision lut values + * + * Creating 32 bit palette entries for better data + * precision. This will be required for HDR and + * similar color processing usecases. + */ +struct drm_color_lut_32 { + /* + * Data for high precision LUTs + */ + __u32 red; + __u32 green; + __u32 blue; + __u32 reserved; +}; >>> >>> Hi, >>> >>> I suppose you need this much precision for optical data? If so, >>> floating-point would >>> be much more appropriate and we could probably keep 16-bit storage. >>> >>> What does the "more than 16-bit" hardware actually use? ISTR at least AMD >>> having some sort of float'ish point internal pipeline? >>> >>> This sounds the same thing as non-uniformly distributed taps in a LUT. >>> That mimics floating-point input while this feels like floating-point >>> output of a LUT. >>> >>> I've recently decided for myself (and Weston) that I will never store >>> optical data in >>> an integer format, because it is far too wasteful. That's why the electrical >>> encodings like power-2.2 are so useful, not just for emulating a CRT. >> >> Hi Pekka, >> Internal pipeline in hardware can operate at higher precision than the input >> framebuffer >> to plane engines. So, in case we have optical data of 16bits or 10bits >> precision, hardware >> can scale this up to higher precision in internal pipeline in hardware to >> take care of rounding >> and overflow issues. Even FP16 optical data will be normalized and converted >> internally for >> further processing. > > Is it integer or floating-point? > For AMD the internal format is floating point with slightly higher precision than FP16. > If we take the full range of PQ as optical and put it into 16-bit > integer format, the luminance step from code 1 to code 2 is 0.15 cd/m². > That seems like a huge step in the dark end. Such a step would > probably need to be divided over several taps in a LUT, which wouldn't > be possible. > Right, and with 32-bpc we'll get a luminance step size of ~0.023 cd/m^2, which seems plenty fine-grained. > In that sense, if a LUT is used for the PQ EOTF, I totally agree that > 16-bit integer won't be even nearly enough precision. > > This actually points out the caveat that increasing the number of taps > in a LUT can cause the LUT to become non-monotonic when the sample > precision runs out. That is, consecutive taps don't always increase in > value. > >> Input to LUT hardware can be 16bits or even higher, so the look up table we >> program can >> be of higher precision than 16 (certain cases 24 in Intel pipeline). This is >> later truncated to bpc supported >> in output formats from sync (10, 12 or 16), mostly for electrical value to >> be sent to sink. >> >> Hence requesting to increase the container from current u16 to u32, to get >> advantage of higher >> precision luts. > > My argument though is to use a floating-point format for the LUT samples > instead of adding more and more integer bits. That naturally puts more > precision where it is needed: near zero. > > A driver can easily convert that to any format the hardware needs. > > However, it might make best sense for a driver to expose a LUT with a > format that best matches the hardware precision, especially > floating-point vs. integer. > > I guess we may eventually need both 32 bpc integer and 16 (or 32) bpc > floating-point. > While I like floating point better for represe
RE: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type
> -Original Message- > From: Pekka Paalanen > Sent: Friday, May 30, 2025 7:28 PM > To: Shankar, Uma > Cc: Simon Ser ; Harry Wentland > ; Alex Hung ; dri- > de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; intel- > g...@lists.freedesktop.org; wayland-devel@lists.freedesktop.org; > leo@amd.com; ville.syrj...@linux.intel.com; pekka.paala...@collabora.com; > m...@igalia.com; jad...@redhat.com; sebastian.w...@redhat.com; > shashank.sha...@amd.com; ago...@nvidia.com; jos...@froggi.es; > mdaen...@redhat.com; aleix...@kde.org; xaver.h...@gmail.com; > victo...@system76.com; dan...@ffwll.ch; quic_nas...@quicinc.com; > quic_cbr...@quicinc.com; quic_abhin...@quicinc.com; mar...@marcan.st; > liviu.du...@arm.com; sashamcint...@google.com; Borah, Chaitanya Kumar > ; louis.chau...@bootlin.com > Subject: Re: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type > > On Thu, 22 May 2025 11:33:00 + > "Shankar, Uma" wrote: > > > One request though: Can we enhance the lut samples from existing > > 16bits to 32bits as lut precision is going to be more than 16 in certain > > hardware. > While adding the new UAPI, lets extend this to 32 to make it future proof. > > Reference: > > https://patchwork.freedesktop.org/patch/642592/?series=129811&rev=4 > > > > +/** > > + * struct drm_color_lut_32 - Represents high precision lut values > > + * > > + * Creating 32 bit palette entries for better data > > + * precision. This will be required for HDR and > > + * similar color processing usecases. > > + */ > > +struct drm_color_lut_32 { > > + /* > > +* Data for high precision LUTs > > +*/ > > + __u32 red; > > + __u32 green; > > + __u32 blue; > > + __u32 reserved; > > +}; > > Hi, > > I suppose you need this much precision for optical data? If so, > floating-point would > be much more appropriate and we could probably keep 16-bit storage. > > What does the "more than 16-bit" hardware actually use? ISTR at least AMD > having some sort of float'ish point internal pipeline? > > This sounds the same thing as non-uniformly distributed taps in a LUT. > That mimics floating-point input while this feels like floating-point output > of a LUT. > > I've recently decided for myself (and Weston) that I will never store optical > data in > an integer format, because it is far too wasteful. That's why the electrical > encodings like power-2.2 are so useful, not just for emulating a CRT. Hi Pekka, Internal pipeline in hardware can operate at higher precision than the input framebuffer to plane engines. So, in case we have optical data of 16bits or 10bits precision, hardware can scale this up to higher precision in internal pipeline in hardware to take care of rounding and overflow issues. Even FP16 optical data will be normalized and converted internally for further processing. Input to LUT hardware can be 16bits or even higher, so the look up table we program can be of higher precision than 16 (certain cases 24 in Intel pipeline). This is later truncated to bpc supported in output formats from sync (10, 12 or 16), mostly for electrical value to be sent to sink. Hence requesting to increase the container from current u16 to u32, to get advantage of higher precision luts. Thanks & Regards, Uma Shankar > > Thanks, > pq
Re: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type
On Tue, 3 Jun 2025 08:30:23 + "Shankar, Uma" wrote: > > -Original Message- > > From: Pekka Paalanen > > Sent: Friday, May 30, 2025 7:28 PM > > To: Shankar, Uma > > Cc: Simon Ser ; Harry Wentland > > ; Alex Hung ; dri- > > de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; intel- > > g...@lists.freedesktop.org; wayland-devel@lists.freedesktop.org; > > leo@amd.com; ville.syrj...@linux.intel.com; > > pekka.paala...@collabora.com; > > m...@igalia.com; jad...@redhat.com; sebastian.w...@redhat.com; > > shashank.sha...@amd.com; ago...@nvidia.com; jos...@froggi.es; > > mdaen...@redhat.com; aleix...@kde.org; xaver.h...@gmail.com; > > victo...@system76.com; dan...@ffwll.ch; quic_nas...@quicinc.com; > > quic_cbr...@quicinc.com; quic_abhin...@quicinc.com; mar...@marcan.st; > > liviu.du...@arm.com; sashamcint...@google.com; Borah, Chaitanya Kumar > > ; louis.chau...@bootlin.com > > Subject: Re: [PATCH V8 32/43] drm/colorop: Add 1D Curve Custom LUT type > > > > On Thu, 22 May 2025 11:33:00 + > > "Shankar, Uma" wrote: > > > > > One request though: Can we enhance the lut samples from existing > > > 16bits to 32bits as lut precision is going to be more than 16 in certain > > > hardware. > > While adding the new UAPI, lets extend this to 32 to make it future proof. > > > Reference: > > > https://patchwork.freedesktop.org/patch/642592/?series=129811&rev=4 > > > > > > +/** > > > + * struct drm_color_lut_32 - Represents high precision lut values > > > + * > > > + * Creating 32 bit palette entries for better data > > > + * precision. This will be required for HDR and > > > + * similar color processing usecases. > > > + */ > > > +struct drm_color_lut_32 { > > > + /* > > > + * Data for high precision LUTs > > > + */ > > > + __u32 red; > > > + __u32 green; > > > + __u32 blue; > > > + __u32 reserved; > > > +}; > > > > Hi, > > > > I suppose you need this much precision for optical data? If so, > > floating-point would > > be much more appropriate and we could probably keep 16-bit storage. > > > > What does the "more than 16-bit" hardware actually use? ISTR at least AMD > > having some sort of float'ish point internal pipeline? > > > > This sounds the same thing as non-uniformly distributed taps in a LUT. > > That mimics floating-point input while this feels like floating-point > > output of a LUT. > > > > I've recently decided for myself (and Weston) that I will never store > > optical data in > > an integer format, because it is far too wasteful. That's why the electrical > > encodings like power-2.2 are so useful, not just for emulating a CRT. > > Hi Pekka, > Internal pipeline in hardware can operate at higher precision than the input > framebuffer > to plane engines. So, in case we have optical data of 16bits or 10bits > precision, hardware > can scale this up to higher precision in internal pipeline in hardware to > take care of rounding > and overflow issues. Even FP16 optical data will be normalized and converted > internally for > further processing. Is it integer or floating-point? If we take the full range of PQ as optical and put it into 16-bit integer format, the luminance step from code 1 to code 2 is 0.15 cd/m². That seems like a huge step in the dark end. Such a step would probably need to be divided over several taps in a LUT, which wouldn't be possible. In that sense, if a LUT is used for the PQ EOTF, I totally agree that 16-bit integer won't be even nearly enough precision. This actually points out the caveat that increasing the number of taps in a LUT can cause the LUT to become non-monotonic when the sample precision runs out. That is, consecutive taps don't always increase in value. > Input to LUT hardware can be 16bits or even higher, so the look up table we > program can > be of higher precision than 16 (certain cases 24 in Intel pipeline). This is > later truncated to bpc supported > in output formats from sync (10, 12 or 16), mostly for electrical value to be > sent to sink. > > Hence requesting to increase the container from current u16 to u32, to get > advantage of higher > precision luts. My argument though is to use a floating-point format for the LUT samples instead of adding more and more integer bits. That naturally puts more precision where it is needed: near zero. A driver can easily convert that to any format the hardware needs. However, it might make best sense for a driver to expose a LUT with a format that best matches the hardware precision, especially floating-point vs. integer. I guess we may eventually need both 32 bpc integer and 16 (or 32) bpc floating-point. Thanks, pq pgpZoY9nN4eoV.pgp Description: OpenPGP digital signature