Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Christopher Braga




On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:


The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

  Color operation 42
  ├─ "type": enum {Bypass, 1D curve} = 1D curve
  ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.


  ├─ "lut_size": immutable range = 4096
  ├─ "lut_data": blob
  └─ "next": immutable color operation ID = 43


Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.


To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.


The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.

Conversion from a larger uint type to a smaller type sounds low effort, 
however if a block works in a floating point space things are going to 
get messy really quickly. If the block operates in FP16 space and the 
interface is 16 bits we are good, but going from 32 bits to FP16 (such 
as in the matrix case or 3DLUT) is less than ideal.



Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.

Indicating if a block operates on / with fixed vs float values is 
significant enough that I think we should account for this in initial 
design. It will have a affect on both the user space value packing + 
expected value ranges in the hardware.



Here is another example with a 3D LUT:

  Color operation 42
  ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
  ├─ "lut_size": immutable range = 33
  ├─ "lut_data": blob
  └─ "next": immutable color operation ID = 43


We are going to need to expose the packing order here to avoid any
programming uncertainty. I don't think we can safely assume all hardware
is equivalent.


The driver can easily change the layout of the matrix and do any conversion
necessary when programming the hardware. We do need to document what layout is
used in the uAPI for sure.


And one last example with a matrix:

  Color operation 42
  ├─ "type": enum {Bypass, Matrix} = Matrix
  ├─ "matrix_data": blob
  └─ "next": immutable color operation ID = 43


It is unclear to me what the default sizing of this matrix is. Any
objections to exposing these details with an additional property?


The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
that wouldn't be enough?


Larger cases do exist, but as you mention this can be resolved with a 
different type then. I don't have any issues with the default 'Matrix' 
type being 9 entries.





Dithering logic exists in some pipelines. I think we need a plan to
expose that here as well.


Hm, I'm not too familiar with dithering. Do you think it would make sense to
expose as an additional colorop block? Do you think it would have more
consequences on the design?

I want to re-iterate that we don't need to ship all features from day 1. We
just need to come up with a uAPI design on which new features can be built on.



Agreed. I don't think this will affect the proposed design so this can 
be figured out once we have a DRM driver impl that 

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Christopher Braga

Hi all,

The goal of this RFC is to expose a generic KMS uAPI to configure the color
pipeline before blending, ie. after a pixel is tapped from a plane's
framebuffer and before it's blended with other planes. With this new 
uAPI we

aim to reduce the battery life impact of color management and HDR on mobile
devices, to improve performance and to decrease latency by skipping
composition on the 3D engine. This proposal is the result of discussions at
the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
familiar with the AMD, Intel and NVIDIA hardware have participated in the
discussion.

This proposal takes a prescriptive approach instead of a descriptive 
approach.

Drivers describe the available hardware blocks in terms of low-level
mathematical operations, then user-space configures each block. We decided
against a descriptive approach where user-space would provide a high-level
description of the colorspace and other parameters: we want to give more
control and flexibility to user-space, e.g. to be able to replicate 
exactly the

color pipeline with shaders and switch between shaders and KMS pipelines
seamlessly, and to avoid forcing user-space into a particular color 
management

policy.


Thanks for posting this Simon! This overview does a great job of
breaking down the proposal. A few questions inline below.


We've decided against mirroring the existing CRTC properties
DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
pipeline can significantly differ between vendors and this approach cannot
accurately abstract all hardware. In particular, the availability, 
ordering and
capabilities of hardware blocks is different on each display engine. So, 
we've

decided to go for a highly detailed hardware capability discovery.

This new uAPI should not be in conflict with existing standard KMS 
properties,

since there are none which control the pre-blending color pipeline at the
moment. It does conflict with any vendor-specific properties like
NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
properties. Drivers will need to either reject atomic commits 
configuring both
uAPIs, or alternatively we could add a DRM client cap which hides the 
vendor

properties and shows the new generic properties when enabled.

To use this uAPI, first user-space needs to discover hardware 
capabilities via
KMS objects and properties, then user-space can configure the hardware 
via an

atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.

Our proposal introduces a new "color_pipeline" plane property, and a new 
KMS
object type, "COLOROP" (short for color operation). The "color_pipeline" 
plane
property is an enum, each enum entry represents a color pipeline 
supported by

the hardware. The special zero entry indicates that the pipeline is in
"bypass"/"no-op" mode. For instance, the following plane properties 
describe a

primary plane with 2 supported pipelines but currently configured in bypass
mode:

     Plane 10
     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
     ├─ …
     └─ "color_pipeline": enum {0, 42, 52} = 0

The non-zero entries describe color pipelines as a linked list of 
COLOROP KMS

objects. The entry value is an object ID pointing to the head of the linked
list (the first operation in the color pipeline).

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

     Color operation 42
     ├─ "type": enum {Bypass, 1D curve} = 1D curve
     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these 
enum values?



     ├─ "lut_size": immutable range = 4096
     ├─ "lut_data": blob
     └─ "next": immutable color operation ID = 43

Some hardware has per channel 1D LUT values, while others use the same 
LUT for all channels.  We will definitely need to expose this in the 
UAPI in some form.


To configure this hardware block, user-space can fill a KMS blob with 
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types 
might

have different properties.


The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline? 
This could result in differences between the hardware behavior

and the shader behavior.

Additionally, some pipelines are floating point while others are fixed. 
How would user space know if it needs to pack 32 bit integer values vs

32 bit float values?


Here is another example with a 3D LUT:

     Color operation 42
     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
     ├─ "lut_size": immutable range = 33
     ├─ "lut_data": bl

Re: [PATCH 0/6] drm: Add mouse cursor hotspot support to atomic KMS

2023-06-12 Thread Albert Esteve



On 6/10/22 10:59, Daniel Vetter wrote:

On Fri, Jun 10, 2022 at 10:41:05AM +0200, Daniel Vetter wrote:

Hi all,

Kinda top post because the thread is sprawling and I think we need a
summary/restart. I think there's at least 3 issues here:

- lack of hotspot property support, which means compositors can't really
   support hotspot with atomic. Which isn't entirely true, because you
   totally can use atomic for the primary planes/crtcs and the legacy
   cursor ioctls, but I understand that people might find that a bit silly :-)

   Anyway this problme is solved by the patch set here, and I think results
   in some nice cleanups to boot.

- the fact that cursors for virtual drivers are not planes, but really
   special things. Which just breaks the universal plane kms uapi. That
   part isn't solved, and I do agree with Simon and Pekka that we really
   should solve this before we unleash even more compositors onto the
   atomic paths of virtual drivers.

   I think the simplest solution for this is:
   1. add a new DRM_PLANE_TYPE_VIRTUAL_CURSOR, and set that for these
   special cursor planes on all virtual drivers
   2. add the new "I understand virtual cursors planes" setparam, filter
   virtual cursor planes for userspace which doesn't set this (like we do
   right now if userspace doesn't set the universal plane mode)
   3. backport the above patches to all stable kernels
   4. make sure the hotspot property is only set on VIRTUAL_CURSOR planes
   and nothing else in the rebased patch series

Simon also mentioned on irc that these special planes must not expose the
CRTC_X/Y property, since that doesn't really do much at all. Or is our
understanding of how this all works for commandeered cursors wrong?


- third issue: These special virtual display properties arent documented.
   Aside from hotspot there's also suggested X/Y and maybe other stuff. I
   have no idea what suggested X/Y does and what userspace should do with
   it. I think we need a new section for virtualized drivers which:
   - documents all the properties involved
   - documents the new cap for enabling virtual cursor planes
   - documents some of the key flows that compositors should implement for
 best experience
   - documents how exactly the user experience will degrade if compositors
 pretend it's just a normal kms driver (maybe put that into each of the
 special flows that a compositor ideally supports)
   - whatever other comments and gaps I've missed, I'm sure
 Simon/Pekka/others will chime in once the patch exists.

Great bonus would be an igt which demonstrates these flows. With the
interactive debug breakpoints to wait for resizing or whatever this should
be all neatly possible.
-Daniel


Hi all,

We have been testing the v2 of this patch and it works correctly for us.

First we tested with a patched Mutter, the mouse clicks were correct, 
and behavior was as expected.


But I've also added an IGT test as suggested above: 
https://gitlab.freedesktop.org/aesteve/igt-gpu-tools/-/compare/master...modeset-cursor-hotspot-test?from_project_id=1274


I could post the IGT patch upstream once Zack's patches land.

Hope that helps!

-Albert




There's a bit of fixing oopsies (virtualized drivers really shouldn't have
enabled universal planes for their cursors) and debt (all these properties
predate the push to document stuff so we need to fix that), but I don't
think it's too much. And I think, from reading the threads, that this
should cover everything?

Anything I've missed? Or got completely wrong?

Cheers, Daniel

On Fri, Jun 03, 2022 at 02:14:59PM +, Simon Ser wrote:

Hi,

Please, read this thread:
https://lists.freedesktop.org/archives/dri-devel/2020-March/thread.html#259615

It has a lot of information about the pitfalls of cursor hotspot and
other things done by VM software.

In particular: since the driver will ignore the KMS cursor plane
position set by user-space, I don't think it's okay to just expose
without opt-in from user-space (e.g. with a DRM_CLIENT_CAP).

cc wayland-devel and Pekka for user-space feedback.

On Thursday, June 2nd, 2022 at 17:42, Zack Rusin  wrote:


- all userspace code needs to hardcore a list of drivers which require
hotspots because there's no way to query from drm "does this driver
require hotspot"

Can you elaborate? I'm not sure I understand what you mean here.

Thanks,

Simon

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch




Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Pekka Paalanen
On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:

> On 6/9/2023 12:30 PM, Simon Ser wrote:
> > Hi Christopher,
> > 
> > On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >  wrote:
> >   
> >>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>> type, a reference to the next COLOROP object in the linked list, and other
> >>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>
> >>>   Color operation 42
> >>>   ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>   ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >> curves? Will different hardware be allowed to expose a subset of these
> >> enum values?  
> > 
> > Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >   
> >>>   ├─ "lut_size": immutable range = 4096
> >>>   ├─ "lut_data": blob
> >>>   └─ "next": immutable color operation ID = 43
> >>>  
> >> Some hardware has per channel 1D LUT values, while others use the same
> >> LUT for all channels.  We will definitely need to expose this in the
> >> UAPI in some form.  
> > 
> > Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> > GAMMA_LUT/
> > DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> > to get exposed as another color operation block.
> >   
> >>> To configure this hardware block, user-space can fill a KMS blob with
> >>> 4096 u32
> >>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>> might
> >>> have different properties.
> >>>  
> >> The bit-depth of the LUT is an important piece of information we should
> >> include by default. Are we assuming that the DRM driver will always
> >> reduce the input values to the resolution supported by the pipeline?
> >> This could result in differences between the hardware behavior
> >> and the shader behavior.
> >>
> >> Additionally, some pipelines are floating point while others are fixed.
> >> How would user space know if it needs to pack 32 bit integer values vs
> >> 32 bit float values?  
> > 
> > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a 
> > common
> > definition of LUT blob (u16 elements) and it's up to the driver to convert.
> > 
> > Using a very precise format for the uAPI has the nice property of making the
> > uAPI much simpler to use. User-space sends high precision data and it's up 
> > to
> > drivers to map that to whatever the hardware accepts.
> >  
> Conversion from a larger uint type to a smaller type sounds low effort, 
> however if a block works in a floating point space things are going to 
> get messy really quickly. If the block operates in FP16 space and the 
> interface is 16 bits we are good, but going from 32 bits to FP16 (such 
> as in the matrix case or 3DLUT) is less than ideal.

Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?

> 
> > Exposing the actual hardware precision is something we've talked about 
> > during
> > the hackfest. It'll probably be useful to some extent, but will require some
> > discussion to figure out how to design the uAPI. Maybe a simple property is
> > enough, maybe not (e.g. fully describing the precision of segmented LUTs 
> > would
> > probably be trickier).
> > 
> > I'd rather keep things simple for the first pass, we can always add more
> > properties for bit depth etc later on.
> >   
> Indicating if a block operates on / with fixed vs float values is 
> significant enough that I think we should account for this in initial 
> design. It will have a affect on both the user space value packing + 
> expected value ranges in the hardware.

What do you mean by "value packing"? Memory layout of the bits forming
a value? Or possible exact values of a specific type?

I don't think fixed vs. float is the most important thing. Even fixed
point formats can have different numbers of bits for whole numbers,
which changes the usable value range and not only precision. Userspace
at the very least needs to know the usable value range for the block's
inputs, outputs, and parameters.

When defining the precision for inputs, outputs and parameters, then
fixed- vs. floating-point becomes meaningful in explaining what "N bits
of precision" means.

Then there is the question of variable precision that depends on the
actual block input and parameter values, how to represent that. Worst
case precision might be too pessimistic alone.

> >>> Here is another example with a 3D LUT:
> >>>
> >>>   Color operation 42
> >>>   ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >>>   ├─ "lut_size": immutable range = 33
> >>>   ├─ "lut_data": blob
> >>>   └─ "next": immutable color operation ID = 43
> >>>  
> >> We are going to nee

Re: [RFC] Plane color pipeline KMS uAPI

2023-06-12 Thread Christopher Braga




On 6/12/2023 5:21 AM, Pekka Paalanen wrote:

On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga  wrote:


On 6/9/2023 12:30 PM, Simon Ser wrote:

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga  
wrote:
   

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

   Color operation 42
   ├─ "type": enum {Bypass, 1D curve} = 1D curve
   ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT

The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these
enum values?


Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
   

   ├─ "lut_size": immutable range = 4096
   ├─ "lut_data": blob
   └─ "next": immutable color operation ID = 43
  

Some hardware has per channel 1D LUT values, while others use the same
LUT for all channels.  We will definitely need to expose this in the
UAPI in some form.


Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.
   

To configure this hardware block, user-space can fill a KMS blob with
4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types
might
have different properties.
  

The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline?
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed.
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?


Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.
  

Conversion from a larger uint type to a smaller type sounds low effort,
however if a block works in a floating point space things are going to
get messy really quickly. If the block operates in FP16 space and the
interface is 16 bits we are good, but going from 32 bits to FP16 (such
as in the matrix case or 3DLUT) is less than ideal.


Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?

Overheard of conversion is the primary concern here. Having to extract 
and / or calculate the significand + exponent components in the kernel 
is burdensome and imo a task better suited for user space. This also has 
to be done every blob set, meaning that if user space is re-using 
pre-calculated blobs we would be repeating the same conversion 
operations in kernel space unnecessarily.


I agree normalization of the value causing precision loss and rounding 
we can't avoid.


We should also consider the fact that float pipelines have been known to 
use the scrgb definition for floating point values 
(https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt). 
In cases like this where there may be a expected value range in the 
pipeline, how to normalize a larger input becomes a little confusing. Ex 
- Does U32 MAX become FP16 MAX or value MAX (i.e 127).





Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.
   

Indicating if a block operates on / with fixed vs float values is
significant enough that I think we should account for this in initial
design. It will have a affect on both the user space value packing +
expected value ranges in the hardware.


What do you mean by "value packing"? Memory layout of the bits forming
a value? Or possible exact values of a specific type? >
Both really. If the kernel is provided a U32 value, we need to know if 
this is a U32 value, or a float packed into a U32 container. Likewise as 
mentioned with the scRGB above, float could even adjust the value range 
expectations.



I don't think fixed vs. float is the most important thing. Even fixed
point formats can have different numbers