Re: [RFC] Plane color pipeline KMS uAPI
On 6/9/2023 12:30 PM, Simon Ser wrote: Hi Christopher, On Friday, June 9th, 2023 at 17:52, Christopher Braga wrote: The new COLOROP objects also expose a number of KMS properties. Each has a type, a reference to the next COLOROP object in the linked list, and other type-specific properties. Here is an example for a 1D LUT operation: Color operation 42 ├─ "type": enum {Bypass, 1D curve} = 1D curve ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D curves? Will different hardware be allowed to expose a subset of these enum values? Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries. ├─ "lut_size": immutable range = 4096 ├─ "lut_data": blob └─ "next": immutable color operation ID = 43 Some hardware has per channel 1D LUT values, while others use the same LUT for all channels. We will definitely need to expose this in the UAPI in some form. Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/ DEGAMMA_LUT properties work. If some hardware can't support that, it'll need to get exposed as another color operation block. To configure this hardware block, user-space can fill a KMS blob with 4096 u32 entries, then set "lut_data" to the blob ID. Other color operation types might have different properties. The bit-depth of the LUT is an important piece of information we should include by default. Are we assuming that the DRM driver will always reduce the input values to the resolution supported by the pipeline? This could result in differences between the hardware behavior and the shader behavior. Additionally, some pipelines are floating point while others are fixed. How would user space know if it needs to pack 32 bit integer values vs 32 bit float values? Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common definition of LUT blob (u16 elements) and it's up to the driver to convert. Using a very precise format for the uAPI has the nice property of making the uAPI much simpler to use. User-space sends high precision data and it's up to drivers to map that to whatever the hardware accepts. Conversion from a larger uint type to a smaller type sounds low effort, however if a block works in a floating point space things are going to get messy really quickly. If the block operates in FP16 space and the interface is 16 bits we are good, but going from 32 bits to FP16 (such as in the matrix case or 3DLUT) is less than ideal. Exposing the actual hardware precision is something we've talked about during the hackfest. It'll probably be useful to some extent, but will require some discussion to figure out how to design the uAPI. Maybe a simple property is enough, maybe not (e.g. fully describing the precision of segmented LUTs would probably be trickier). I'd rather keep things simple for the first pass, we can always add more properties for bit depth etc later on. Indicating if a block operates on / with fixed vs float values is significant enough that I think we should account for this in initial design. It will have a affect on both the user space value packing + expected value ranges in the hardware. Here is another example with a 3D LUT: Color operation 42 ├─ "type": enum {Bypass, 3D LUT} = 3D LUT ├─ "lut_size": immutable range = 33 ├─ "lut_data": blob └─ "next": immutable color operation ID = 43 We are going to need to expose the packing order here to avoid any programming uncertainty. I don't think we can safely assume all hardware is equivalent. The driver can easily change the layout of the matrix and do any conversion necessary when programming the hardware. We do need to document what layout is used in the uAPI for sure. And one last example with a matrix: Color operation 42 ├─ "type": enum {Bypass, Matrix} = Matrix ├─ "matrix_data": blob └─ "next": immutable color operation ID = 43 It is unclear to me what the default sizing of this matrix is. Any objections to exposing these details with an additional property? The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where that wouldn't be enough? Larger cases do exist, but as you mention this can be resolved with a different type then. I don't have any issues with the default 'Matrix' type being 9 entries. Dithering logic exists in some pipelines. I think we need a plan to expose that here as well. Hm, I'm not too familiar with dithering. Do you think it would make sense to expose as an additional colorop block? Do you think it would have more consequences on the design? I want to re-iterate that we don't need to ship all features from day 1. We just need to come up with a uAPI design on which new features can be built on. Agreed. I don't think this will affect the proposed design so this can be figured out once we have a DRM driver impl that
Re: [RFC] Plane color pipeline KMS uAPI
Hi all, The goal of this RFC is to expose a generic KMS uAPI to configure the color pipeline before blending, ie. after a pixel is tapped from a plane's framebuffer and before it's blended with other planes. With this new uAPI we aim to reduce the battery life impact of color management and HDR on mobile devices, to improve performance and to decrease latency by skipping composition on the 3D engine. This proposal is the result of discussions at the Red Hat HDR hackfest [1] which took place a few days ago. Engineers familiar with the AMD, Intel and NVIDIA hardware have participated in the discussion. This proposal takes a prescriptive approach instead of a descriptive approach. Drivers describe the available hardware blocks in terms of low-level mathematical operations, then user-space configures each block. We decided against a descriptive approach where user-space would provide a high-level description of the colorspace and other parameters: we want to give more control and flexibility to user-space, e.g. to be able to replicate exactly the color pipeline with shaders and switch between shaders and KMS pipelines seamlessly, and to avoid forcing user-space into a particular color management policy. Thanks for posting this Simon! This overview does a great job of breaking down the proposal. A few questions inline below. We've decided against mirroring the existing CRTC properties DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management pipeline can significantly differ between vendors and this approach cannot accurately abstract all hardware. In particular, the availability, ordering and capabilities of hardware blocks is different on each display engine. So, we've decided to go for a highly detailed hardware capability discovery. This new uAPI should not be in conflict with existing standard KMS properties, since there are none which control the pre-blending color pipeline at the moment. It does conflict with any vendor-specific properties like NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific properties. Drivers will need to either reject atomic commits configuring both uAPIs, or alternatively we could add a DRM client cap which hides the vendor properties and shows the new generic properties when enabled. To use this uAPI, first user-space needs to discover hardware capabilities via KMS objects and properties, then user-space can configure the hardware via an atomic commit. This works similarly to the existing KMS uAPI, e.g. planes. Our proposal introduces a new "color_pipeline" plane property, and a new KMS object type, "COLOROP" (short for color operation). The "color_pipeline" plane property is an enum, each enum entry represents a color pipeline supported by the hardware. The special zero entry indicates that the pipeline is in "bypass"/"no-op" mode. For instance, the following plane properties describe a primary plane with 2 supported pipelines but currently configured in bypass mode: Plane 10 ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary ├─ … └─ "color_pipeline": enum {0, 42, 52} = 0 The non-zero entries describe color pipelines as a linked list of COLOROP KMS objects. The entry value is an object ID pointing to the head of the linked list (the first operation in the color pipeline). The new COLOROP objects also expose a number of KMS properties. Each has a type, a reference to the next COLOROP object in the linked list, and other type-specific properties. Here is an example for a 1D LUT operation: Color operation 42 ├─ "type": enum {Bypass, 1D curve} = 1D curve ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D curves? Will different hardware be allowed to expose a subset of these enum values? ├─ "lut_size": immutable range = 4096 ├─ "lut_data": blob └─ "next": immutable color operation ID = 43 Some hardware has per channel 1D LUT values, while others use the same LUT for all channels. We will definitely need to expose this in the UAPI in some form. To configure this hardware block, user-space can fill a KMS blob with 4096 u32 entries, then set "lut_data" to the blob ID. Other color operation types might have different properties. The bit-depth of the LUT is an important piece of information we should include by default. Are we assuming that the DRM driver will always reduce the input values to the resolution supported by the pipeline? This could result in differences between the hardware behavior and the shader behavior. Additionally, some pipelines are floating point while others are fixed. How would user space know if it needs to pack 32 bit integer values vs 32 bit float values? Here is another example with a 3D LUT: Color operation 42 ├─ "type": enum {Bypass, 3D LUT} = 3D LUT ├─ "lut_size": immutable range = 33 ├─ "lut_data": bl
Re: [PATCH 0/6] drm: Add mouse cursor hotspot support to atomic KMS
On 6/10/22 10:59, Daniel Vetter wrote: On Fri, Jun 10, 2022 at 10:41:05AM +0200, Daniel Vetter wrote: Hi all, Kinda top post because the thread is sprawling and I think we need a summary/restart. I think there's at least 3 issues here: - lack of hotspot property support, which means compositors can't really support hotspot with atomic. Which isn't entirely true, because you totally can use atomic for the primary planes/crtcs and the legacy cursor ioctls, but I understand that people might find that a bit silly :-) Anyway this problme is solved by the patch set here, and I think results in some nice cleanups to boot. - the fact that cursors for virtual drivers are not planes, but really special things. Which just breaks the universal plane kms uapi. That part isn't solved, and I do agree with Simon and Pekka that we really should solve this before we unleash even more compositors onto the atomic paths of virtual drivers. I think the simplest solution for this is: 1. add a new DRM_PLANE_TYPE_VIRTUAL_CURSOR, and set that for these special cursor planes on all virtual drivers 2. add the new "I understand virtual cursors planes" setparam, filter virtual cursor planes for userspace which doesn't set this (like we do right now if userspace doesn't set the universal plane mode) 3. backport the above patches to all stable kernels 4. make sure the hotspot property is only set on VIRTUAL_CURSOR planes and nothing else in the rebased patch series Simon also mentioned on irc that these special planes must not expose the CRTC_X/Y property, since that doesn't really do much at all. Or is our understanding of how this all works for commandeered cursors wrong? - third issue: These special virtual display properties arent documented. Aside from hotspot there's also suggested X/Y and maybe other stuff. I have no idea what suggested X/Y does and what userspace should do with it. I think we need a new section for virtualized drivers which: - documents all the properties involved - documents the new cap for enabling virtual cursor planes - documents some of the key flows that compositors should implement for best experience - documents how exactly the user experience will degrade if compositors pretend it's just a normal kms driver (maybe put that into each of the special flows that a compositor ideally supports) - whatever other comments and gaps I've missed, I'm sure Simon/Pekka/others will chime in once the patch exists. Great bonus would be an igt which demonstrates these flows. With the interactive debug breakpoints to wait for resizing or whatever this should be all neatly possible. -Daniel Hi all, We have been testing the v2 of this patch and it works correctly for us. First we tested with a patched Mutter, the mouse clicks were correct, and behavior was as expected. But I've also added an IGT test as suggested above: https://gitlab.freedesktop.org/aesteve/igt-gpu-tools/-/compare/master...modeset-cursor-hotspot-test?from_project_id=1274 I could post the IGT patch upstream once Zack's patches land. Hope that helps! -Albert There's a bit of fixing oopsies (virtualized drivers really shouldn't have enabled universal planes for their cursors) and debt (all these properties predate the push to document stuff so we need to fix that), but I don't think it's too much. And I think, from reading the threads, that this should cover everything? Anything I've missed? Or got completely wrong? Cheers, Daniel On Fri, Jun 03, 2022 at 02:14:59PM +, Simon Ser wrote: Hi, Please, read this thread: https://lists.freedesktop.org/archives/dri-devel/2020-March/thread.html#259615 It has a lot of information about the pitfalls of cursor hotspot and other things done by VM software. In particular: since the driver will ignore the KMS cursor plane position set by user-space, I don't think it's okay to just expose without opt-in from user-space (e.g. with a DRM_CLIENT_CAP). cc wayland-devel and Pekka for user-space feedback. On Thursday, June 2nd, 2022 at 17:42, Zack Rusin wrote: - all userspace code needs to hardcore a list of drivers which require hotspots because there's no way to query from drm "does this driver require hotspot" Can you elaborate? I'm not sure I understand what you mean here. Thanks, Simon -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [RFC] Plane color pipeline KMS uAPI
On Fri, 9 Jun 2023 19:11:25 -0400 Christopher Braga wrote: > On 6/9/2023 12:30 PM, Simon Ser wrote: > > Hi Christopher, > > > > On Friday, June 9th, 2023 at 17:52, Christopher Braga > > wrote: > > > >>> The new COLOROP objects also expose a number of KMS properties. Each has a > >>> type, a reference to the next COLOROP object in the linked list, and other > >>> type-specific properties. Here is an example for a 1D LUT operation: > >>> > >>> Color operation 42 > >>> ├─ "type": enum {Bypass, 1D curve} = 1D curve > >>> ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT > >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D > >> curves? Will different hardware be allowed to expose a subset of these > >> enum values? > > > > Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries. > > > >>> ├─ "lut_size": immutable range = 4096 > >>> ├─ "lut_data": blob > >>> └─ "next": immutable color operation ID = 43 > >>> > >> Some hardware has per channel 1D LUT values, while others use the same > >> LUT for all channels. We will definitely need to expose this in the > >> UAPI in some form. > > > > Hm, I was assuming per-channel 1D LUTs here, just like the existing > > GAMMA_LUT/ > > DEGAMMA_LUT properties work. If some hardware can't support that, it'll need > > to get exposed as another color operation block. > > > >>> To configure this hardware block, user-space can fill a KMS blob with > >>> 4096 u32 > >>> entries, then set "lut_data" to the blob ID. Other color operation types > >>> might > >>> have different properties. > >>> > >> The bit-depth of the LUT is an important piece of information we should > >> include by default. Are we assuming that the DRM driver will always > >> reduce the input values to the resolution supported by the pipeline? > >> This could result in differences between the hardware behavior > >> and the shader behavior. > >> > >> Additionally, some pipelines are floating point while others are fixed. > >> How would user space know if it needs to pack 32 bit integer values vs > >> 32 bit float values? > > > > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a > > common > > definition of LUT blob (u16 elements) and it's up to the driver to convert. > > > > Using a very precise format for the uAPI has the nice property of making the > > uAPI much simpler to use. User-space sends high precision data and it's up > > to > > drivers to map that to whatever the hardware accepts. > > > Conversion from a larger uint type to a smaller type sounds low effort, > however if a block works in a floating point space things are going to > get messy really quickly. If the block operates in FP16 space and the > interface is 16 bits we are good, but going from 32 bits to FP16 (such > as in the matrix case or 3DLUT) is less than ideal. Hi Christopher, are you thinking of precision loss, or the overhead of conversion? Conversion from N-bit fixed point to N-bit floating-point is generally lossy, too, and the other direction as well. What exactly would be messy? > > > Exposing the actual hardware precision is something we've talked about > > during > > the hackfest. It'll probably be useful to some extent, but will require some > > discussion to figure out how to design the uAPI. Maybe a simple property is > > enough, maybe not (e.g. fully describing the precision of segmented LUTs > > would > > probably be trickier). > > > > I'd rather keep things simple for the first pass, we can always add more > > properties for bit depth etc later on. > > > Indicating if a block operates on / with fixed vs float values is > significant enough that I think we should account for this in initial > design. It will have a affect on both the user space value packing + > expected value ranges in the hardware. What do you mean by "value packing"? Memory layout of the bits forming a value? Or possible exact values of a specific type? I don't think fixed vs. float is the most important thing. Even fixed point formats can have different numbers of bits for whole numbers, which changes the usable value range and not only precision. Userspace at the very least needs to know the usable value range for the block's inputs, outputs, and parameters. When defining the precision for inputs, outputs and parameters, then fixed- vs. floating-point becomes meaningful in explaining what "N bits of precision" means. Then there is the question of variable precision that depends on the actual block input and parameter values, how to represent that. Worst case precision might be too pessimistic alone. > >>> Here is another example with a 3D LUT: > >>> > >>> Color operation 42 > >>> ├─ "type": enum {Bypass, 3D LUT} = 3D LUT > >>> ├─ "lut_size": immutable range = 33 > >>> ├─ "lut_data": blob > >>> └─ "next": immutable color operation ID = 43 > >>> > >> We are going to nee
Re: [RFC] Plane color pipeline KMS uAPI
On 6/12/2023 5:21 AM, Pekka Paalanen wrote: On Fri, 9 Jun 2023 19:11:25 -0400 Christopher Braga wrote: On 6/9/2023 12:30 PM, Simon Ser wrote: Hi Christopher, On Friday, June 9th, 2023 at 17:52, Christopher Braga wrote: The new COLOROP objects also expose a number of KMS properties. Each has a type, a reference to the next COLOROP object in the linked list, and other type-specific properties. Here is an example for a 1D LUT operation: Color operation 42 ├─ "type": enum {Bypass, 1D curve} = 1D curve ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D curves? Will different hardware be allowed to expose a subset of these enum values? Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries. ├─ "lut_size": immutable range = 4096 ├─ "lut_data": blob └─ "next": immutable color operation ID = 43 Some hardware has per channel 1D LUT values, while others use the same LUT for all channels. We will definitely need to expose this in the UAPI in some form. Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/ DEGAMMA_LUT properties work. If some hardware can't support that, it'll need to get exposed as another color operation block. To configure this hardware block, user-space can fill a KMS blob with 4096 u32 entries, then set "lut_data" to the blob ID. Other color operation types might have different properties. The bit-depth of the LUT is an important piece of information we should include by default. Are we assuming that the DRM driver will always reduce the input values to the resolution supported by the pipeline? This could result in differences between the hardware behavior and the shader behavior. Additionally, some pipelines are floating point while others are fixed. How would user space know if it needs to pack 32 bit integer values vs 32 bit float values? Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common definition of LUT blob (u16 elements) and it's up to the driver to convert. Using a very precise format for the uAPI has the nice property of making the uAPI much simpler to use. User-space sends high precision data and it's up to drivers to map that to whatever the hardware accepts. Conversion from a larger uint type to a smaller type sounds low effort, however if a block works in a floating point space things are going to get messy really quickly. If the block operates in FP16 space and the interface is 16 bits we are good, but going from 32 bits to FP16 (such as in the matrix case or 3DLUT) is less than ideal. Hi Christopher, are you thinking of precision loss, or the overhead of conversion? Conversion from N-bit fixed point to N-bit floating-point is generally lossy, too, and the other direction as well. What exactly would be messy? Overheard of conversion is the primary concern here. Having to extract and / or calculate the significand + exponent components in the kernel is burdensome and imo a task better suited for user space. This also has to be done every blob set, meaning that if user space is re-using pre-calculated blobs we would be repeating the same conversion operations in kernel space unnecessarily. I agree normalization of the value causing precision loss and rounding we can't avoid. We should also consider the fact that float pipelines have been known to use the scrgb definition for floating point values (https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt). In cases like this where there may be a expected value range in the pipeline, how to normalize a larger input becomes a little confusing. Ex - Does U32 MAX become FP16 MAX or value MAX (i.e 127). Exposing the actual hardware precision is something we've talked about during the hackfest. It'll probably be useful to some extent, but will require some discussion to figure out how to design the uAPI. Maybe a simple property is enough, maybe not (e.g. fully describing the precision of segmented LUTs would probably be trickier). I'd rather keep things simple for the first pass, we can always add more properties for bit depth etc later on. Indicating if a block operates on / with fixed vs float values is significant enough that I think we should account for this in initial design. It will have a affect on both the user space value packing + expected value ranges in the hardware. What do you mean by "value packing"? Memory layout of the bits forming a value? Or possible exact values of a specific type? > Both really. If the kernel is provided a U32 value, we need to know if this is a U32 value, or a float packed into a U32 container. Likewise as mentioned with the scRGB above, float could even adjust the value range expectations. I don't think fixed vs. float is the most important thing. Even fixed point formats can have different numbers