Re: [RFC] Plane color pipeline KMS uAPI

2023-06-16 Thread Pekka Paalanen
On Thu, 15 Jun 2023 17:44:33 -0400
Christopher Braga  wrote:

> On 6/14/2023 5:00 AM, Pekka Paalanen wrote:
> > On Tue, 13 Jun 2023 12:29:55 -0400
> > Christopher Braga  wrote:
> >   
> >> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:  
> >>> On Mon, 12 Jun 2023 12:56:57 -0400
> >>> Christopher Braga  wrote:
> >>>  
>  On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> > On Fri, 9 Jun 2023 19:11:25 -0400
> > Christopher Braga  wrote:
> > 
> >> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>> Hi Christopher,
> >>>
> >>> On Friday, June 9th, 2023 at 17:52, Christopher Braga 
> >>>  wrote:
> >>>
> > The new COLOROP objects also expose a number of KMS properties. 
> > Each has a
> > type, a reference to the next COLOROP object in the linked list, 
> > and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >  Color operation 42
> >  ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >  ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = 
> > LUT  
>  The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>  curves? Will different hardware be allowed to expose a subset of 
>  these
>  enum values?  
> >>>
> >>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum 
> >>> entries.
> >>>
> >  ├─ "lut_size": immutable range = 4096
> >  ├─ "lut_data": blob
> >  └─ "next": immutable color operation ID = 43
> >   
>  Some hardware has per channel 1D LUT values, while others use the 
>  same
>  LUT for all channels.  We will definitely need to expose this in the
>  UAPI in some form.  
> >>>
> >>> Hm, I was assuming per-channel 1D LUTs here, just like the existing 
> >>> GAMMA_LUT/
> >>> DEGAMMA_LUT properties work. If some hardware can't support that, 
> >>> it'll need
> >>> to get exposed as another color operation block.
> >>>
> > To configure this hardware block, user-space can fill a KMS blob 
> > with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation 
> > types
> > might
> > have different properties.
> >   
>  The bit-depth of the LUT is an important piece of information we 
>  should
>  include by default. Are we assuming that the DRM driver will always
>  reduce the input values to the resolution supported by the pipeline?
>  This could result in differences between the hardware behavior
>  and the shader behavior.
> 
>  Additionally, some pipelines are floating point while others are 
>  fixed.
>  How would user space know if it needs to pack 32 bit integer values 
>  vs
>  32 bit float values?  
> >>>
> >>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use 
> >>> a common
> >>> definition of LUT blob (u16 elements) and it's up to the driver to 
> >>> convert.
> >>>
> >>> Using a very precise format for the uAPI has the nice property of 
> >>> making the
> >>> uAPI much simpler to use. User-space sends high precision data and 
> >>> it's up to
> >>> drivers to map that to whatever the hardware accepts.
> >>>   
> >> Conversion from a larger uint type to a smaller type sounds low effort,
> >> however if a block works in a floating point space things are going to
> >> get messy really quickly. If the block operates in FP16 space and the
> >> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >> as in the matrix case or 3DLUT) is less than ideal.  
> >
> > Hi Christopher,
> >
> > are you thinking of precision loss, or the overhead of conversion?
> >
> > Conversion from N-bit fixed point to N-bit floating-point is generally
> > lossy, too, and the other direction as well.
> >
> > What exactly would be messy?
> > 
>  Overheard of conversion is the primary concern here. Having to extract
>  and / or calculate the significand + exponent components in the kernel
>  is burdensome and imo a task better suited for user space. This also has
>  to be done every blob set, meaning that if user space is re-using
>  pre-calculated blobs we would be repeating the same conversion
>  operations in kernel space unnecessarily.  
> >>>
> >>> What is burdensome in that calculation? I don't think you would need to
> >>> use any actual floating-point instructions. Logarithm for finding the
> >>> exponent is about finding the highest bit set in an integer and
> >>> everything is conveniently expressed in base-2. Finding significand is
> >>> jus

Re: Refresh rates with multiple monitors

2023-06-16 Thread Pekka Paalanen
On Thu, 15 Jun 2023 16:58:12 -0500
Matt Hoosier  wrote:

> On Wed, Jun 14, 2023 at 7:13 PM Daniel Stone  wrote:
> 
> > Hi Joe,
> >
> > On Wed, 14 Jun 2023 at 21:33, Joe M  wrote:
> >  
> >> Thanks Daniel. Do you know if wl_output instances are decoupled from each
> >> other, when it comes to display refresh?
> >>  
> >
> > Yep, absolutely.
> >
> >  
> >> The wl_output geometry info hints that each output can be thought of as a
> >> region in a larger compositor canvas, given the logical x/y fields in the
> >> geometry. Is the compositor able to handle the repaint scheduling in a
> >> refresh-aware way?
> >>  
> >
> > Yes.
> >
> >  
> >> I'm trying to get a better understanding of how these pieces interact to
> >> maximize draw time but still hit the glass every frame. The various blog
> >> posts and documentation out there are pretty clear when it comes to drawing
> >> to a single physical display, but less so when multiple displays are in 
> >> use.
> >>  
> >
> > Per-output repaint cycles are taken as a given. You can assume that every
> > compositor does this, and any compositor which doesn't do this is so
> > hopelessly broken as to not be worth considering.
> >  
> 
> You can use the wp_presentation extension API to get real-time measurements
> about how long elapsed between the moment you submit an updated buffer and
> when it hits the glass. If you work backward from that number, you can
> figure out how long beforehand to start your drawing so that you can get
> minimally stale rendered contends but not drop any frames.

That's a bit inconvenient though. In order to find the deadline inside a
frame cycle, every time you start your animation you have to repeatedly
miss the deadline in order to probe where it is. Then you guess how
much margin you need to make it reliable enough to not miss the
deadline accidentally. Then, there are compositors who dynamically
adjust their own updates, moving the deadline around.

These are grave enough problems that I believe the consensus is that
minimizing latency to light is not realistically possible with the
existing protocols.

What the wp_presentation is good for is to estimate when your update
turns into light if you submit the update early enough, so you can get
e.g. A/V sync right or predict game state. Early enough is basically
committing a ready buffer when the frame callback of the previous
update returns. But that is not minimizing the sub-frame latency.
Depending on a compositor it may be maximizing it.

People have been thinking about new protocol extensions for a while,
but I haven't followed how far they got.


Thanks,
pq


pgpVzejJZF64P.pgp
Description: OpenPGP digital signature