On Fri, Oct 10, 2025 at 4:15 AM Christian König <[email protected]> wrote: > > Ah, yes I've talked about that topic with Michel just last week on XDC. > > It would make sense to have a generic interface to query the errors so that > the display manager/compositor can do something reasonable when an > application messes up its rendering. > > E.g. display a an error message instead of just a window full of random > pixels.
Right, exactly. It'd be nice to have just one interface for all the drm user-facing apis (DRM core and driver specific) to figure out that some asynchronous task/job errored out. > On 09.10.25 22:35, Dave Airlie wrote: > > Just adding Christian and Faith, who might have some more comments. > > > > On Fri, 10 Oct 2025 at 06:04, Zack Rusin <[email protected]> wrote: > >> > >> Propagate the fence errors from drivers to userspace. Allows userspace to > >> react to asynchronous errors coming from the drivers. > >> > >> One of the trickiest bits of drm syncobj interface is that, unexpectedly, > >> the syncobj doesn't propagate the fence errors on wait. Whenever something > >> goes wrong in an asynchronous task/job that uses drm syncobj to > >> communicate with the userspace there's no way to convey that issue > >> with userspace as drm syncobj wait function will only check whether > >> a fence has been signaled but not whether it has been signaled without > >> error. > >> > >> Instead of assuming that a signaled fence implies success grab the actual > >> status of the fence and return the first fence error that has been > >> spotted. Return the first error because all the subsequent errors are > >> likely to be caused by the initial error in a chain of tasks. > >> > >> [RFC]: Some drivers (e.g. Xe) do accept drm syncobj's in the vm_bind > >> and exec interface, they also call dma_fence_set_error when those > >> operations asynchronously fail, currently those errors will just be > >> silently ignored (because they don't propagate), I'm not sure how the > >> userspace written for those drivers will react to actually receiving > >> those errors, even if silently dropping those driver errors seems > >> completely wrong to me. > > IIRC during the initial drm_syncobj or timeline bringup we had a brief > discussion if we should do this on wait and then decided against it. > > The wait functionality in both sync_file as well as DMA-buf file descriptor > doesn't bubble up the error on wait either. > > Instead the sync_file has an SYNC_IOC_FILE_INFO IOCTL to query the result of > the operation separately after the wait is completed. > > Amdgpu, Nouveau and i915 have functions to do this in a driver specific ways. What's the mechanism in amdgpu for getting the errors back from fences? The thing I'm trying to reason about is "assuming a new driver with new userspace what would be the friendliest interface to communicate those errors". In general I thought that accepting syncobj/syncobj_timeline as arguments to vm_bind and exec ioctl and signaling them would make the most sense, but only if we can reasonably get asynchronous errors from waits on those things. > I think we should just add an DRM_IOCTL_SYNCOBJ_ERRNO IOCTL (feel free to > come up with a better name) to query the potential error from a timeline sync > point after waiting for it has completed. > > One problem could be that fences with errors are garbage collected on a > timeline before we have a chance to return the error code to userspace, but > in this case I think we can just propagate the error through the timeline. So cache the first fence error that was spotted during the wait and return that? Would it reset the error on the ioctl or would it reset on the next wait? That could definitely work. So assuming that maybe instead of DRM_IOCTL_SYNCOBJ_ERRNO we'd call it DRM_IOCTL_SYNCOBJ_STATUS the procedure would be: long timeout = drm_syncobj_wait(...); if (timeout < 0) // wait error int status = drm_syncobj_status(...); if (status < 0) // at least one of the fences associated with the syncobj failed Would that make sense? z
smime.p7s
Description: S/MIME Cryptographic Signature
