Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS

2024-02-29 Thread Daniel Vetter
On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote:
> This is an RFC set for a color pipeline API, along with a sample
> implementation in VKMS. All the key API bits are here. VKMS now
> supports two named transfer function colorops and two matrix
> colorops. We have IGT tests that check all four of these colorops
> with a pixel-by-pixel comparison that checks that these colorops
> do what we expect them to do with a +/- 1 8 bpc code point margin.

So vkms is definitely great to make sure the igts are generic enough and
somewhat useful, but ... does steam run on vkms too? I think that would be
a really good test to show that the api we have here is actually useful
for compositors in a cross-driver way, and not just a neat idea that
doesn't survive harsh reality.

And yes I realize that's probably going to be a bunch of work, but I feel
like the color pipeline discussion has dragged around enough in
hypotheticals and concerns that I think it would really help a lot.

Thoughts?
-Sima

> 
> The big new change with v4 is the addition of an amdgpu color
> pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support
> the following:
> 
> 1. 1D Curve EOTF
> 2. 3x4 CTM
> 3. Multiplier
> 4. 1D Curve Inverse EOTF
> 5. 1D LUT
> 6. 1D Curve EOTF
> 7. 1D LUT
> 
> The supported curves for the 1D Curve type are:
> - sRGB EOTF and its inverse
> - PQ EOTF, scaled to [0.0, 125.0] and its inverse
> - BT.2020/BT.709 OETF and its inverse
> 
> Note that the 1st and 5th colorops take the EOTF or Inverse
> OETF while the 3rd colorop takes the Inverse EOTF or OETF.
> 
> We are working on two more ops for amdgpu, the HDR multiplier
> and the 3DLUT, which will give us this:
> 
> 1. 1D Curve EOTF
> 2. 3x4 CTM
> 3. HDR Multiplier
> 4. 1D Curve Inverse EOTF
> 5. 1D LUT
> 6. 3D LUT
> 7. 1D Curve EOTF
> 8. 1D LUT
> 
> This, essentially mirrors the color pipeline used by gamescope
> and presented by Melissa Wen, with the exception of the DEGAM
> LUT, which is not currently used. See
> [1] 
> https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf
> 
> After this we'd like to also add the following ops:
> - Scaler (Informational only)
> - Color Encoding, to replace drm_plane's COLOR_ENCODING
> - Color Range, to replace drm_plane's COLOR_RANGE
> 
> This patchset is grouped as follows:
>  - Patches 1-3: couple general patches/fixes
>  - Patches 4-7: introduce kunit to VKMS
>  - Patch 7: description of motivation and details behind the
> Color Pipeline API. If you're reading nothing else
> but are interested in the topic I highly recommend
> you take a look at this.
>  - Patches 7-27: DRM core and VKMS changes for color pipeline API
>  - Patches 28-40: DRM core and amdgpu changes for color pipeline API
> 
> VKMS patches could still be improved in a few ways, though the
> payoff might be limited and I would rather focus on other work
> at the moment. The most obvious thing to improve would be to
> eliminate the hard-coded LUTs for identity, and sRGB, and replace
> them with fixed-point math instead.
> 
> There are plenty of things that I would like to see here but
> haven't had a chance to look at. These will (hopefully) be
> addressed in future iterations, either in VKMS or amdgpu:
>  - Clear documentation for each drm_colorop_type
>  - Add custom LUT colorops to VKMS
>  - Add pre-blending 3DLUT
>  - How to support HW which can't bypass entire pipeline?
>  - Add ability to create colorops that don't have BYPASS
>  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
>  - read-only scaling colorop which defines scaling taps and position
>  - read-only color format colorop to define supported color formats
>for a pipeline
>  - named matrices, for things like converting YUV to RGB
> 
> IGT tests can be found at
> https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1
> 
> IGT patches are also being sent to the igt-dev mailing list.
> 
> If you prefer a gitlab MR for review you can find it at
> https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5
> 
> v4:
>  - Add amdgpu color pipeline (WIP)
>  - Don't block setting of deprecated properties, instead pass client cap
>to atomic check so drivers can ignore these props
>  - Drop IOCTL definitions (Pekka)
>  - Use enum property for colorop TYPE (Pekka)
>  - A few cleanups to the docs (Pekka)
>  - Rework the TYPE enum to name relation to avoid code duplication (Pekka)
>  - Add missing function declarations (Chaitanya Kumar Borah)
>  - Allow setting of NEXT property to NULL in _set_ function (Chaitanya Kumar 
> Borah)
>  - Add helper for creation of pipeline drm_plane property (Pekka)
>  - Always create Bypass pipeline (Pekka)
>  - A bunch of changes to VKMS kunit tests (Pekka)
>  - Fix index in CTM doc (Pekka)
> 
> v3:
>  - Abandon IOCTLs and discover colorops as clients iterate the pipeline
>  - Remove need for libdrm
>  - Add co

Re: Wayland debugging with Qtwayland, gstreamer waylandsink, wayland-lib and Weston

2024-02-29 Thread Pekka Paalanen
On Wed, 28 Feb 2024 18:04:28 +
Terry Barnaby  wrote:

> Hi Pekka,
> 
> Some questions below:
> 
> Thanks
> 
> Terry
> On 26/02/2024 15:56, Pekka Paalanen wrote:
> > Ok. What Wayland API requests cause a surface to actually be mapped  
> >> (Sorry don't really know Wayland protocol) ?  
> > Hi Terry,
> >
> > the basic protocol object is wl_surface. The wl_surface needs to be
> > given a "role". This is a specific protocol term. xdg_surface and
> > xdg_toplevel can give a wl_surface the role of a top-level window,
> > which means it can get mapped when you play by the rules set in the
> > xdg_toplevel specification.
> >
> > Sub-surface is another role.
> >
> > So the rules are always role specific, but at some point they all
> > require content on the wl_surface, which is given with the attach
> > request followed by a commit request. Role rules specify when and how
> > that can be done.  
> 
> Yes, I have heard that. But what I don't knoe is from the client:
> 
>  1. How do I find out the surfaces role ?

It is what you (or Qt, or Gst) set it to. There is no way to query it
(or any other thing) back by Wayland.

If you look at a protocol dump (e.g. WAYLAND_DEBUG=client in
environment), you can could follow the protocol messages and trace back
what the role was set to.

>  2. How would I set the surface to have a role such that it would be
> mapped and thus visible ? Just wondering if I can work around what I
> think is a QtWayland bug/issue/feature to make sure by second
> Widgets surface is mapped/visible so that the waylandsink subsurface
> can work. With X11 there were API calls to change the Windows state
> and I was looking for something similar with Wayland.

There is no simple answer to this. You pick a role you need, and then
play by the protocol spec.

You do not have any surfaces without roles, though, so this would not
help you anyway. Roles cannot be changed, only set once per wl_surface
life time. Sub-surface is a role.

> 
> I need to find some way to actually display video, simply and 
> efficiently on an embedded platform, in a Qt application in the year 2024 :)
> 
> I have tried lots of work arounds but none have worked due to either Qt 
> issues, Wayland restrictions, Gstreamer restrictions, Weston 
> issues/restrictions, NXP hardware engine issues/restrictions etc. Any 
> ideas gratefully received!
> 

Did you try making the "middle" QWidget *not* have a wl_surface of its
own?

> >  
> >> The higher level surfaces are created/managed by QtWayland, but maybe I
> >> can override somehow.
> >>  
> > That does not feel like a good idea to me. But I also cannot really
> > help you, because this all seems to be pointing at Qt which I know
> > nothing about.  
> 
> Yes, thats probably true. But I need to get this to work somehow, even 
> if a kludge for now.

Hack on Qt, then? Sorry, but I don't understand this insistence that
what sounds like a Qt bug must be workaround-able via Wayland.


> >  
> >>> 
>  As mentioned before, If I use QPainter to draw into the video QWidget it
>  actually draws into the top QWidgets 16 surface using Wayland protocol.
>  I would have thought it would draw into its own QWidget surface 18 as it
>  has Qt::WA_NativeWindow set ?  
> > This question seems to be the essence. If Qt worked like you expected,
> > then I think the whole program would work.
> >
> > However, there is no need (from Wayland perspective) to have a
> > wl_surface as "surface 18" in the middle. What would be best is if you
> > could somehow have that "middle" QWidget but without it's own
> > wl_surface. Have the QWidget control the GStreamer wl_surface position
> > through wl_subsurface interface, while GStreamer plays the video
> > through wl_surface interface.
> >
> > Wayland does not relay sub-surface resizing or positioning between two
> > client-side components at all. There is not even a way query a
> > sub-surface position. So the positioning and sizing is all done in
> > Qt<->GStreamer somehow without Wayland in between. Only the end result
> > gets sent over Wayland to display: Qt sets up the position and
> > GStreamer sets up the size and content.  
> 
> I think this middle surface is needed so that Qt can manage the 
> "Windows" at this level, like raise, lower, resize et. and Wayland 

Hmm, that does not sound right to me, but then again, I don't know Qt.

Wayland certainly does not impose such demand.

> sink's subsurface that is below this is separate and can be de-synced 
> for the video display etc. I normally (on X11 and with Qt5/Wayland), 
> respond to QWidget resizes and use the Gstreamer API calls to 
> position/resize the waylandsink's sub-surface. This all works quite 
> nicely under X11 and worked (although not nicely) under Qt5/Wayland.
> 
> >  
>  I assume that Qtwayland is not "activating" the video QWidget's surface
>  or using it for some reason (send subsurface expose events ?) ?
>  
> >>> If that's true, th

Re: [RFC PATCH v4 00/42] Color Pipeline API w/ VKMS

2024-02-29 Thread Joshua Ashton




On 2/29/24 10:43, Daniel Vetter wrote:

On Mon, Feb 26, 2024 at 04:10:14PM -0500, Harry Wentland wrote:

This is an RFC set for a color pipeline API, along with a sample
implementation in VKMS. All the key API bits are here. VKMS now
supports two named transfer function colorops and two matrix
colorops. We have IGT tests that check all four of these colorops
with a pixel-by-pixel comparison that checks that these colorops
do what we expect them to do with a +/- 1 8 bpc code point margin.


So vkms is definitely great to make sure the igts are generic enough and
somewhat useful, but ... does steam run on vkms too? I think that would be
a really good test to show that the api we have here is actually useful
for compositors in a cross-driver way, and not just a neat idea that
doesn't survive harsh reality.

And yes I realize that's probably going to be a bunch of work, but I feel
like the color pipeline discussion has dragged around enough in
hypotheticals and concerns that I think it would really help a lot.


I don't think we have ever tested Steam/Gamescope on vkms.

The last time I tried stuff there, there was all the problems with the 
ttm page table tail thing for virtio stuff that made using Steam games + 
Gamescope unfeasable because Vulkan + bindless, but I have heard those 
are solved now?


I will have to try it again at the weekend and see where it's at.
I am willing to place my bets that some part of the stack will fall over 
relating to modifiers somehow... =P


But yes, testing there would be good too, as we have the full Steam Deck 
OLED HDR color pipeline implemented in shader-based composition 
validated as being 1:1 on a suite of HDR and SDR test images.
(That *will* definitely rely on 3D LUTs being tetrahedrally interpolated 
though)


I'll have a look at this at the weekend and also see about getting a 
Gamescope branch that uses the new wip colorop stuff.


- Joshie 🐸✨



Thoughts?
-Sima



The big new change with v4 is the addition of an amdgpu color
pipeline, for all AMD GPUs with DCN 3 and newer. Amdgpu now support
the following:

1. 1D Curve EOTF
2. 3x4 CTM
3. Multiplier
4. 1D Curve Inverse EOTF
5. 1D LUT
6. 1D Curve EOTF
7. 1D LUT

The supported curves for the 1D Curve type are:
- sRGB EOTF and its inverse
- PQ EOTF, scaled to [0.0, 125.0] and its inverse
- BT.2020/BT.709 OETF and its inverse

Note that the 1st and 5th colorops take the EOTF or Inverse
OETF while the 3rd colorop takes the Inverse EOTF or OETF.

We are working on two more ops for amdgpu, the HDR multiplier
and the 3DLUT, which will give us this:

1. 1D Curve EOTF
2. 3x4 CTM
3. HDR Multiplier
4. 1D Curve Inverse EOTF
5. 1D LUT
6. 3D LUT
7. 1D Curve EOTF
8. 1D LUT

This, essentially mirrors the color pipeline used by gamescope
and presented by Melissa Wen, with the exception of the DEGAM
LUT, which is not currently used. See
[1] 
https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf

After this we'd like to also add the following ops:
- Scaler (Informational only)
- Color Encoding, to replace drm_plane's COLOR_ENCODING
- Color Range, to replace drm_plane's COLOR_RANGE

This patchset is grouped as follows:
  - Patches 1-3: couple general patches/fixes
  - Patches 4-7: introduce kunit to VKMS
  - Patch 7: description of motivation and details behind the
 Color Pipeline API. If you're reading nothing else
 but are interested in the topic I highly recommend
 you take a look at this.
  - Patches 7-27: DRM core and VKMS changes for color pipeline API
  - Patches 28-40: DRM core and amdgpu changes for color pipeline API

VKMS patches could still be improved in a few ways, though the
payoff might be limited and I would rather focus on other work
at the moment. The most obvious thing to improve would be to
eliminate the hard-coded LUTs for identity, and sRGB, and replace
them with fixed-point math instead.

There are plenty of things that I would like to see here but
haven't had a chance to look at. These will (hopefully) be
addressed in future iterations, either in VKMS or amdgpu:
  - Clear documentation for each drm_colorop_type
  - Add custom LUT colorops to VKMS
  - Add pre-blending 3DLUT
  - How to support HW which can't bypass entire pipeline?
  - Add ability to create colorops that don't have BYPASS
  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
  - read-only scaling colorop which defines scaling taps and position
  - read-only color format colorop to define supported color formats
for a pipeline
  - named matrices, for things like converting YUV to RGB

IGT tests can be found at
https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1

IGT patches are also being sent to the igt-dev mailing list.

If you prefer a gitlab MR for review you can find it at
https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5

v4:
  - Add amdgpu color pipeline (WIP)
  - Don't block

Re: why not flow control in wl_connection_flush?

2024-02-29 Thread jleivent
On Tue, 27 Feb 2024 11:01:18 +0200
Pekka Paalanen  wrote:


> > But suppose I'm writing a client that has the possibility of
> > sending a rapid stream of msgs to the compositor that might be,
> > depending on what other clients are doing, too fast for the
> > compositor to handle, and I'd like to implement some flow control.
> > I don't want the connection to the compositor to sever or for the
> > condition to cause memory consumption without any ability for me to
> > find out about and control the situation.  Especially if I could
> > slow down that rapid stream of updates without too high a cost to
> > what my client is trying to accomplish.
> > 
> > Is there a way I could do that?  
> 
> Get the Wayland fd with wl_display_get_fd() and poll it for writable.
> If it's not writable, you're sending too fast.

That's what I was looking for!  I think... Maybe?

> 
> That's what any client should always do. Usually it would be prompted
> by wl_display_flush() erroring out with EAGAIN as your cue to start
> polling for writable. It's even documented.

But, calling wl_display_flush too often is bad for throughput, right?
Isn't it better to allow the ring buffer to determine itself when to
flush based on being full, and batch send many msgs?  Obviously
sometimes the client has nothing more to send (for a while), so
wl_display_flush then makes sense.  But in this case, it does have more
to send and wants to know if it should attempt to do so or hold back.

I could instead wait for the display fd to be writable before
attempting each msg send.  But the display fd may be writable merely
because the ring buffer hasn't tried flushing yet.  But the ring buffer
could have less than enough space for the msg I'm about to send.  And
the socket buffer could have very little space - just enough for it to
say its writable.

Which means that sometimes polling the display fd will return writable
when an attempt to send a msg is still going to result in ring buffer
growth or client disconnect.

So... back to calling wl_display_flush ... sometimes.

I guess I could call wl_display_flush after what I think is about 4K
worth of msg content.  Wl_display_flush returns the amount sent, so
that helps keep the extra state I need to maintain.

Is there currently a way I could get the size of contents in the output
ring buffer?