Am Mi., 20. Mai 2026 um 10:08 Uhr schrieb Christian König
<[email protected]>:
> Well I would say the other way around is a pretty common use case.
>
> In other words the compositors uses the internal GPU for composing and 
> displaying the picture. And the client uses the external GPU for fast 
> rendering.
Sure, but that's not what I'm talking about.

> > - the buffers from the client stay valid
>
> Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings 
> either result in a SIGBUS or are redirected to a dummy page.
Again, not what I wrote about. The buffers are on the integrated GPU.

> > - the syncobj stays valid on the client side
> > - the syncobj becomes invalid on the compositor side
>
> Nope that's not correct. The syncobj itself stays valid even if you 
> completely hot plug the device.
>
> It can just be that the fences inside the syncobj are terminated with an 
> error.
What about eventfd created for a point on the syncobj?

Another (future) problem with hotplugs will be if the sync file hasn't
materialized for the timeline point when the device is hotunplugged,
since there can't be an error on the fence if there isn't one. Or
could userspace somehow set an 'artificial' fence with an error in
that case?

> > "invalid" there means either
> > - the acquire point of the client is marked as signaled, before
> > rendering on the client side is completed
> > - the acquire point of the client is never signaled. Since the
> > compositor waits for the acquire point, the Wayland surface is stuck
> > forever
>
> Both of those would be a *massive* violation of documented kernel rules for 
> hot-plugging which could lead to random data corruption and/or deadlocks.
>
> If you see any HW driver showing behavior like that please open up a bug 
> report and ping the relevant maintainers immediately.
If there are no error codes with syncobj yet, then to userspace, the
latter behavior is exactly what we get, isn't it?

> When a hotplug happens all operations of the device should return an -ENODEV 
> error, even when exposed to other devices/application through syncobj or 
> syncfile.
Okay, that at least gives us a way to fail imports somewhat
gracefully. Normally, failing to import a syncobj is a fatal error in
the Wayland protocol.

> One problem is that only syncfile allows for querying such error codes at the 
> moment, we have patches pending to add that to syncobj as well but we lack a 
> compositor with support for that as userspace client.
As long as the error case can be detected with an eventfd,
implementing that in KWin shouldn't be a challenge.

> Well the question here is if the device the compositor is using or the client 
> is using is gone?
>
> If the client device is hot removed the compositor should be perfectly 
> capable to import the syncobj.
>
> If the compositor device is gone then you don't have a device to display 
> anything any more, so generating the next frame doesn't seem to make sense 
> either.
>
> What could be is that you want the compositor to be kept alive even when the 
> display device is gone to switch over to vkms or whatever so that a VNC 
> session or other remote desktop still works.
There are two GPUs in the example I gave. The compositor can use both
for rendering (in cosmic-comp's case) or switch between them (what I'm
trying to do with KWin), or use one device for rendering, and another
for importing the syncobj.

> >>>>> 3. It removes the need to translate between syncobjs fds and handles.
> >>>>
> >>>> That's a pretty big no-go as well. The differentiation between FDs and 
> >>>> handles is completely intentional.
> >>> Could you expand on why it's needed? For compositors, the handle is
> >>> just an intermediary thing when translating between file descriptors.
> >>
> >> Well what we could do is to add an IOCTL to directly attach an syncobj 
> >> file descriptor to an eventfd.
> > That would be nice.
>
> Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is 
> used. Adding that functionality shouldn't be more than a typing exercise.
Yeah, this patchset already adds that functionality (on the new device).

> Do I see it right that this would already solve most problems in the 
> compositor side?
Skipping the syncobj handle step would only reduce the amounts of
ioctls the compositor does, but afaict it wouldn't solve any
compositor problems. At least not as long as it's still tied to a drm
device.
For device hotplugs, the only new thing we need for correctly handling
syncobj is a way to receive errors on the eventfd.

A device-independent way to create and use syncobj would still be
useful to us though, both to simplify the compositor and to improve
the software rendering use cases.

- Xaver

Reply via email to