Just adding Christian and Faith, who might have some more comments.
On Fri, 10 Oct 2025 at 06:04, Zack Rusin <[email protected]> wrote: > > Propagate the fence errors from drivers to userspace. Allows userspace to > react to asynchronous errors coming from the drivers. > > One of the trickiest bits of drm syncobj interface is that, unexpectedly, > the syncobj doesn't propagate the fence errors on wait. Whenever something > goes wrong in an asynchronous task/job that uses drm syncobj to > communicate with the userspace there's no way to convey that issue > with userspace as drm syncobj wait function will only check whether > a fence has been signaled but not whether it has been signaled without > error. > > Instead of assuming that a signaled fence implies success grab the actual > status of the fence and return the first fence error that has been > spotted. Return the first error because all the subsequent errors are > likely to be caused by the initial error in a chain of tasks. > > [RFC]: Some drivers (e.g. Xe) do accept drm syncobj's in the vm_bind > and exec interface, they also call dma_fence_set_error when those > operations asynchronously fail, currently those errors will just be > silently ignored (because they don't propagate), I'm not sure how the > userspace written for those drivers will react to actually receiving > those errors, even if silently dropping those driver errors seems > completely wrong to me. > > Signed-off-by: Zack Rusin <[email protected]> > Cc: [email protected] > Cc: David Airlie <[email protected]> > Cc: Simona Vetter <[email protected]> > Cc: Maarten Lankhorst <[email protected]> > Cc: Maxime Ripard <[email protected]> > Cc: Thomas Zimmermann <[email protected]> > --- > drivers/gpu/drm/drm_syncobj.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c > index e1b0fa4000cd..bcd8eff8b59a 100644 > --- a/drivers/gpu/drm/drm_syncobj.c > +++ b/drivers/gpu/drm/drm_syncobj.c > @@ -1067,6 +1067,7 @@ static signed long > drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, > struct dma_fence *fence; > uint64_t *points; > uint32_t signaled_count, i; > + int fence_status, first_fence_error = 0; > > if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT | > DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) { > @@ -1170,6 +1171,9 @@ static signed long > drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, > dma_fence_add_callback(fence, > &entries[i].fence_cb, > > syncobj_wait_fence_func))) { > + fence_status = dma_fence_get_status(fence); > + if (fence_status < 0 && !first_fence_error) > + first_fence_error = fence_status; > /* The fence has been signaled */ > if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) { > signaled_count++; > @@ -1213,6 +1217,14 @@ static signed long > drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs, > err_free_points: > kfree(points); > > + /* > + * Propagate the last fence error the code has seen, but > + * give precedence to the overall wait error in case one > + * was encountered. > + */ > + if (first_fence_error < 0 && timeout >= 0) > + timeout = first_fence_error; > + > return timeout; > } > > -- > 2.48.1 >
