Propagate the fence errors from drivers to userspace. Allows userspace to
react to asynchronous errors coming from the drivers.

One of the trickiest bits of drm syncobj interface is that, unexpectedly,
the syncobj doesn't propagate the fence errors on wait. Whenever something
goes wrong in an asynchronous task/job that uses drm syncobj to
communicate with the userspace there's no way to convey that issue
with userspace as drm syncobj wait function will only check whether
a fence has been signaled but not whether it has been signaled without
error.

Instead of assuming that a signaled fence implies success grab the actual
status of the fence and return the first fence error that has been
spotted. Return the first error because all the subsequent errors are
likely to be caused by the initial error in a chain of tasks.

[RFC]: Some drivers (e.g. Xe) do accept drm syncobj's in the vm_bind
and exec interface, they also call dma_fence_set_error when those
operations asynchronously fail, currently those errors will just be
silently ignored (because they don't propagate), I'm not sure how the
userspace written for those drivers will react to actually receiving
those errors, even if silently dropping those driver errors seems
completely wrong to me.

Signed-off-by: Zack Rusin <[email protected]>
Cc: [email protected]
Cc: David Airlie <[email protected]>
Cc: Simona Vetter <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
---
 drivers/gpu/drm/drm_syncobj.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e1b0fa4000cd..bcd8eff8b59a 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1067,6 +1067,7 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
        struct dma_fence *fence;
        uint64_t *points;
        uint32_t signaled_count, i;
+       int fence_status, first_fence_error = 0;
 
        if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
                     DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
@@ -1170,6 +1171,9 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
                             dma_fence_add_callback(fence,
                                                    &entries[i].fence_cb,
                                                    syncobj_wait_fence_func))) {
+                               fence_status = dma_fence_get_status(fence);
+                               if (fence_status < 0 && !first_fence_error)
+                                       first_fence_error = fence_status;
                                /* The fence has been signaled */
                                if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL) {
                                        signaled_count++;
@@ -1213,6 +1217,14 @@ static signed long drm_syncobj_array_wait_timeout(struct 
drm_syncobj **syncobjs,
 err_free_points:
        kfree(points);
 
+       /*
+        * Propagate the last fence error the code has seen, but
+        * give precedence to the overall wait error in case one
+        * was encountered.
+        */
+       if (first_fence_error < 0 && timeout >= 0)
+               timeout = first_fence_error;
+
        return timeout;
 }
 
-- 
2.48.1

Reply via email to