Just adding Christian and Faith, who might have some more comments.

On Fri, 10 Oct 2025 at 06:04, Zack Rusin <[email protected]> wrote:
>
> Propagate the fence errors from drivers to userspace. Allows userspace to
> react to asynchronous errors coming from the drivers.
>
> One of the trickiest bits of drm syncobj interface is that, unexpectedly,
> the syncobj doesn't propagate the fence errors on wait. Whenever something
> goes wrong in an asynchronous task/job that uses drm syncobj to
> communicate with the userspace there's no way to convey that issue
> with userspace as drm syncobj wait function will only check whether
> a fence has been signaled but not whether it has been signaled without
> error.
>
> Instead of assuming that a signaled fence implies success grab the actual
> status of the fence and return the first fence error that has been
> spotted. Return the first error because all the subsequent errors are
> likely to be caused by the initial error in a chain of tasks.
>
> [RFC]: Some drivers (e.g. Xe) do accept drm syncobj's in the vm_bind
> and exec interface, they also call dma_fence_set_error when those
> operations asynchronously fail, currently those errors will just be
> silently ignored (because they don't propagate), I'm not sure how the
> userspace written for those drivers will react to actually receiving
> those errors, even if silently dropping those driver errors seems
> completely wrong to me.
>
> Signed-off-by: Zack Rusin <[email protected]>
> Cc: [email protected]
> Cc: David Airlie <[email protected]>
> Cc: Simona Vetter <[email protected]>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Maxime Ripard <[email protected]>
> Cc: Thomas Zimmermann <[email protected]>
> ---
>  drivers/gpu/drm/drm_syncobj.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index e1b0fa4000cd..bcd8eff8b59a 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1067,6 +1067,7 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>         struct dma_fence *fence;
>         uint64_t *points;
>         uint32_t signaled_count, i;
> +       int fence_status, first_fence_error = 0;
>
>         if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
>                      DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
> @@ -1170,6 +1171,9 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>                              dma_fence_add_callback(fence,
>                                                     &entries[i].fence_cb,
>                                                     
> syncobj_wait_fence_func))) {
> +                               fence_status = dma_fence_get_status(fence);
> +                               if (fence_status < 0 && !first_fence_error)
> +                                       first_fence_error = fence_status;
>                                 /* The fence has been signaled */
>                                 if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) {
>                                         signaled_count++;
> @@ -1213,6 +1217,14 @@ static signed long 
> drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
>  err_free_points:
>         kfree(points);
>
> +       /*
> +        * Propagate the last fence error the code has seen, but
> +        * give precedence to the overall wait error in case one
> +        * was encountered.
> +        */
> +       if (first_fence_error < 0 && timeout >= 0)
> +               timeout = first_fence_error;
> +
>         return timeout;
>  }
>
> --
> 2.48.1
>

Reply via email to