On Wed, Mar 15, 2017 at 09:23:01AM +0000, Tvrtko Ursulin wrote:
> 
> On 14/03/2017 09:34, Chris Wilson wrote:
> >When we wedge the device, we override engine->submit_request with a nop
> >to ensure that all in-flight requests are marked in error. However, igt
> >would like to unwedge the device to test -EIO handling. This requires us
> >to flush those in-flight requests and restore the original
> >engine->submit_request.
> >
> >v2: Use a vfunc to unify enabling request submission to engines
> >v3: Split new vfunc to a separate patch.
> >
> >Fixes: 821ed7df6e2a ("drm/i915: Update reset path to fix incomplete 
> >requests")
> >Testcase: igt/gem_eio
> >Signed-off-by: Chris Wilson <[email protected]>
> >Cc: Tvrtko Ursulin <[email protected]>
> >Cc: Mika Kuoppala <[email protected]>
> >---
> > drivers/gpu/drm/i915/i915_drv.c |  2 +-
> > drivers/gpu/drm/i915/i915_drv.h |  1 +
> > drivers/gpu/drm/i915/i915_gem.c | 51 
> > +++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 53 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.c 
> >b/drivers/gpu/drm/i915/i915_drv.c
> >index e312b61ba6bb..11d1066b673c 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.c
> >+++ b/drivers/gpu/drm/i915/i915_drv.c
> >@@ -1822,7 +1822,7 @@ void i915_reset(struct drm_i915_private *dev_priv)
> >             return;
> >
> >     /* Clear any previous failed attempts at recovery. Time to try again. */
> >-    __clear_bit(I915_WEDGED, &error->flags);
> >+    i915_gem_unset_wedged(dev_priv);
> >     error->reset_count++;
> >
> >     pr_notice("drm/i915: Resetting chip after gpu hang\n");
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h 
> >b/drivers/gpu/drm/i915/i915_drv.h
> >index 48ff64812289..53a791d8d992 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -3406,6 +3406,7 @@ int i915_gem_reset_prepare(struct drm_i915_private 
> >*dev_priv);
> > void i915_gem_reset(struct drm_i915_private *dev_priv);
> > void i915_gem_reset_finish(struct drm_i915_private *dev_priv);
> > void i915_gem_set_wedged(struct drm_i915_private *dev_priv);
> >+void i915_gem_unset_wedged(struct drm_i915_private *dev_priv);
> >
> > void i915_gem_init_mmio(struct drm_i915_private *i915);
> > int __must_check i915_gem_init(struct drm_i915_private *dev_priv);
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> >b/drivers/gpu/drm/i915/i915_gem.c
> >index 202bb850f260..e06830916a05 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -3000,6 +3000,57 @@ void i915_gem_set_wedged(struct drm_i915_private 
> >*dev_priv)
> >     mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
> > }
> >
> >+void i915_gem_unset_wedged(struct drm_i915_private *i915)
> >+{
> >+    struct i915_gem_timeline *tl;
> >+    int i;
> >+
> >+    lockdep_assert_held(&i915->drm.struct_mutex);
> >+    if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
> >+            return;
> >+
> >+    /* Before unwedging, make sure that all pending operations
> >+     * are flushed and errored out. No more can be submitted until
> >+     * we reset the wedged bit.
> >+     */
> >+    list_for_each_entry(tl, &i915->gt.timelines, link) {
> >+            for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
> >+                    struct drm_i915_gem_request *rq;
> >+
> >+                    rq = i915_gem_active_peek(&tl->engine[i].last_request,
> >+                                              &i915->drm.struct_mutex);
> >+                    if (!rq)
> >+                            continue;
> >+
> >+                    /* We can't use our normal waiter as we want to
> >+                     * avoid recursively trying to handle the current
> >+                     * reset. The basic dma_fence_default_wait() installs
> >+                     * a callback for dma_fence_signal(), which is
> >+                     * triggered by our nop handler (indirectly, the
> >+                     * callback enables the signaler thread which is
> >+                     * woken by the nop_submit_request() advancing the seqno
> >+                     * and when the seqno passes the fence, the signaler
> >+                     * then signals the fence waking us up).
> >+                     */
> >+                    dma_fence_default_wait(&rq->fence, false,
> >+                                           MAX_SCHEDULE_TIMEOUT);
> >+            }
> >+    }
> >+
> >+    /* Undo nop_submit_request. We prevent all new i915 requests from
> >+     * being queued (by disallowing execbuf whilst wedged) so having
> >+     * waited for all active requests above, we know the system is idle
> >+     * and do not have to worry about a thread being inside
> >+     * engine->submit_request() as we swap over. So unlike installing
> >+     * the nop_submit_request on reset, we can do this from normal
> >+     * context and do not require stop_machine().
> >+     */
> >+    intel_engines_enable_submission(i915);
> 
> So the point of the dma_fence_default_wait above is it to ensure all
> nop_submit_request call backs have completed? I don't at the moment
> understand how could there be such callbacks since unwedge is
> happening after the wedge. So the wedge already installed the nop
> handler, and by the time we get to another reset attempt, isn't it
> already guaranteed all of those have exited?

There's no such guarantee. The nop_submit_request() is to ensure that all
third party driven requests are flushed and our requests are marked with
dma_fence_set_error(-EIO) and not submitted to hw.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to