Re: [RFC] Plane color pipeline KMS uAPI

2023-06-14 Thread Pekka Paalanen
On Tue, 13 Jun 2023 12:29:55 -0400
Christopher Braga  wrote:

> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:
> > On Mon, 12 Jun 2023 12:56:57 -0400
> > Christopher Braga  wrote:
> >   
> >> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> >>> On Fri, 9 Jun 2023 19:11:25 -0400
> >>> Christopher Braga  wrote:
> >>>  
>  On 6/9/2023 12:30 PM, Simon Ser wrote:  
> > Hi Christopher,
> >
> > On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >  wrote:
> > 
> >>> The new COLOROP objects also expose a number of KMS properties. Each 
> >>> has a
> >>> type, a reference to the next COLOROP object in the linked list, and 
> >>> other
> >>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>
> >>> Color operation 42
> >>> ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>> ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = 
> >>> LUT  
> >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >> curves? Will different hardware be allowed to expose a subset of these
> >> enum values?  
> >
> > Yes. Only hardcoded LUTs supported by the HW are exposed as enum 
> > entries.
> > 
> >>> ├─ "lut_size": immutable range = 4096
> >>> ├─ "lut_data": blob
> >>> └─ "next": immutable color operation ID = 43
> >>>
> >> Some hardware has per channel 1D LUT values, while others use the same
> >> LUT for all channels.  We will definitely need to expose this in the
> >> UAPI in some form.  
> >
> > Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> > GAMMA_LUT/
> > DEGAMMA_LUT properties work. If some hardware can't support that, it'll 
> > need
> > to get exposed as another color operation block.
> > 
> >>> To configure this hardware block, user-space can fill a KMS blob with
> >>> 4096 u32
> >>> entries, then set "lut_data" to the blob ID. Other color operation 
> >>> types
> >>> might
> >>> have different properties.
> >>>
> >> The bit-depth of the LUT is an important piece of information we should
> >> include by default. Are we assuming that the DRM driver will always
> >> reduce the input values to the resolution supported by the pipeline?
> >> This could result in differences between the hardware behavior
> >> and the shader behavior.
> >>
> >> Additionally, some pipelines are floating point while others are fixed.
> >> How would user space know if it needs to pack 32 bit integer values vs
> >> 32 bit float values?  
> >
> > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a 
> > common
> > definition of LUT blob (u16 elements) and it's up to the driver to 
> > convert.
> >
> > Using a very precise format for the uAPI has the nice property of 
> > making the
> > uAPI much simpler to use. User-space sends high precision data and it's 
> > up to
> > drivers to map that to whatever the hardware accepts.
> >
>  Conversion from a larger uint type to a smaller type sounds low effort,
>  however if a block works in a floating point space things are going to
>  get messy really quickly. If the block operates in FP16 space and the
>  interface is 16 bits we are good, but going from 32 bits to FP16 (such
>  as in the matrix case or 3DLUT) is less than ideal.  
> >>>
> >>> Hi Christopher,
> >>>
> >>> are you thinking of precision loss, or the overhead of conversion?
> >>>
> >>> Conversion from N-bit fixed point to N-bit floating-point is generally
> >>> lossy, too, and the other direction as well.
> >>>
> >>> What exactly would be messy?
> >>>  
> >> Overheard of conversion is the primary concern here. Having to extract
> >> and / or calculate the significand + exponent components in the kernel
> >> is burdensome and imo a task better suited for user space. This also has
> >> to be done every blob set, meaning that if user space is re-using
> >> pre-calculated blobs we would be repeating the same conversion
> >> operations in kernel space unnecessarily.  
> > 
> > What is burdensome in that calculation? I don't think you would need to
> > use any actual floating-point instructions. Logarithm for finding the
> > exponent is about finding the highest bit set in an integer and
> > everything is conveniently expressed in base-2. Finding significand is
> > just masking the integer based on the exponent.
> >   
> Oh it definitely can be done, but I think this is just a difference of 
> opinion at this point. At the end of the day we will do it if we have 
> to, but it is just more optimal if a more agreeable common type is used.
> 
> > Can you not cache the converted data, keyed by the DRM blob unique
> > identity vs. the KMS property it is attached to?  
> If the userspace composit

Re: Refresh rates with multiple monitors

2023-06-14 Thread Joe M
 Thanks Daniel. Do you know if wl_output instances are decoupled from each 
other, when it comes to display refresh?
The wl_output geometry info hints that each output can be thought of as a 
region in a larger compositor canvas, given the logical x/y fields in the 
geometry. Is the compositor able to handle the repaint scheduling in a 
refresh-aware way?
I'm trying to get a better understanding of how these pieces interact to 
maximize draw time but still hit the glass every frame. The various blog posts 
and documentation out there are pretty clear when it comes to drawing to a 
single physical display, but less so when multiple displays are in use.
On Tuesday, June 13, 2023 at 03:42:41 AM PDT, Daniel Stone 
 wrote:  
 
 Hi,
On Tue, 13 Jun 2023 at 10:20, Pekka Paalanen  wrote:

On Tue, 13 Jun 2023 01:11:44 + (UTC)
Joe M  wrote:
> As I understand, there is one global wl_display. Is there always one
> wl_compositor too?

That is inconsequential.


Yeah, I think the really consequential thing is that a wl_display really just 
represents a connection to a Wayland server (aka compositor).
Display targets (e.g. 'the HDMI connector on the left', 'the DSI panel') are 
represented by wl_output objects. There is one of those for each output.
Cheers,Daniel  

Re: Refresh rates with multiple monitors

2023-06-14 Thread Daniel Stone
Hi Joe,

On Wed, 14 Jun 2023 at 21:33, Joe M  wrote:

> Thanks Daniel. Do you know if wl_output instances are decoupled from each
> other, when it comes to display refresh?
>

Yep, absolutely.


> The wl_output geometry info hints that each output can be thought of as a
> region in a larger compositor canvas, given the logical x/y fields in the
> geometry. Is the compositor able to handle the repaint scheduling in a
> refresh-aware way?
>

Yes.


> I'm trying to get a better understanding of how these pieces interact to
> maximize draw time but still hit the glass every frame. The various blog
> posts and documentation out there are pretty clear when it comes to drawing
> to a single physical display, but less so when multiple displays are in use.
>

Per-output repaint cycles are taken as a given. You can assume that every
compositor does this, and any compositor which doesn't do this is so
hopelessly broken as to not be worth considering.

Cheers,
Daniel