Always wait on the start of the signaler request to reduce the problem
of dequeueing the bonded pair too early -- we want both payloads to
start at the same time, with no latency, and yet still allow others to
make full use of the slack in the system.

Remindme: add testcases for starting the bonded pair too early due to an
infinite spin before the signaler, and via a semaphore.

Testcase: XXX
Signed-off-by: Chris Wilson <[email protected]>
---
 drivers/gpu/drm/i915/i915_request.c | 33 ++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c 
b/drivers/gpu/drm/i915/i915_request.c
index 4bfe68edfc81..c0a0089111a1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1127,14 +1127,37 @@ __i915_request_await_execution(struct i915_request *to,
                                          &from->fence))
                return 0;
 
-       /* Ensure both start together [after all semaphores in signal] */
-       if (intel_engine_has_semaphores(to->engine))
-               err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
-       else
-               err = i915_request_await_start(to, from);
+       /*
+        * Wait until the start of this request.
+        *
+        * The execution cb fires when we submit the request to HW. But in
+        * many cases this may be long before the request itself is ready to
+        * run (consider that we submit 2 requests for the same context, where
+        * the request of interest is behind an indefinite spinner). So we hook
+        * up to both to reduce our queues tidy and execution lag minimised in
+        * the worst case, though we hope that the await_start is elided.
+        */
+       err = i915_request_await_start(to, from);
        if (err < 0)
                return err;
 
+       /*
+        * Ensure both start together [after all semaphores in signal]
+        *
+        * Now that we are queued to the HW at roughly the same time (thanks
+        * to the execute cb) and are ready to run at roughly the same time
+        * (thanks to the await start), our signaler may still be indefinitely
+        * delayed by waiting on a semaphore from a remote engine. If our
+        * signaler depends on a sempahore, so indirectly do we, and we do not
+        * want to start our payload until our signaler also starts theirs.
+        * So we wait.
+        */
+       if (intel_engine_has_semaphores(to->engine) && from->sched.semaphores) {
+               err = __emit_semaphore_wait(to, from, from->fence.seqno - 1);
+               if (err < 0)
+                       return err;
+       }
+
        /* Couple the dependency tree for PI on this exposed to->fence */
        if (to->engine->schedule) {
                err = i915_sched_node_add_dependency(&to->sched, &from->sched);
-- 
2.25.1

_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to