Quoting Chris Wilson (2018-03-02 11:33:24)
> After starting hard at sequences like
>
> [ 28.199013] systemd-1 2..s. 26062228us :
> execlists_submission_tasklet: rcs0 cs-irq head=0 [0?], tail=1 [1?]
> [ 28.199095] systemd-1 2..s. 26062229us :
> execlists_submission_tasklet: rcs0 csb[1]: status=0x00000018:0x00000000,
> active=0x1
> [ 28.199177] systemd-1 2..s. 26062230us :
> execlists_submission_tasklet: rcs0 out[0]: ctx=0.1, seqno=3, prio=-1024
> [ 28.199258] systemd-1 2..s. 26062231us :
> execlists_submission_tasklet: rcs0 completed ctx=0
> [ 28.199340] gem_eio-829 1..s1 26066853us :
> execlists_submission_tasklet: rcs0 in[0]: ctx=1.1, seqno=1, prio=0
> [ 28.199421] <idle>-0 2..s. 26066863us :
> execlists_submission_tasklet: rcs0 cs-irq head=1 [1?], tail=2 [2?]
> [ 28.199503] <idle>-0 2..s. 26066865us :
> execlists_submission_tasklet: rcs0 csb[2]: status=0x00000001:0x00000000,
> active=0x1
> [ 28.199585] gem_eio-829 1..s1 26067077us :
> execlists_submission_tasklet: rcs0 in[1]: ctx=3.1, seqno=2, prio=0
> [ 28.199667] gem_eio-829 1..s1 26067078us :
> execlists_submission_tasklet: rcs0 in[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199749] <idle>-0 2..s. 26067084us :
> execlists_submission_tasklet: rcs0 cs-irq head=2 [2?], tail=3 [3?]
> [ 28.199830] <idle>-0 2..s. 26067085us :
> execlists_submission_tasklet: rcs0 csb[3]: status=0x00008002:0x00000001,
> active=0x1
> [ 28.199912] <idle>-0 2..s. 26067086us :
> execlists_submission_tasklet: rcs0 out[0]: ctx=1.2, seqno=1, prio=0
> [ 28.199994] gem_eio-829 2..s. 28246084us :
> execlists_submission_tasklet: rcs0 cs-irq head=3 [3?], tail=4 [4?]
> [ 28.200096] gem_eio-829 2..s. 28246088us :
> execlists_submission_tasklet: rcs0 csb[4]: status=0x00000014:0x00000001,
> active=0x5
> [ 28.200178] gem_eio-829 2..s. 28246089us :
> execlists_submission_tasklet: rcs0 out[0]: ctx=0.0, seqno=0, prio=0
> [ 28.200260] gem_eio-829 2..s. 28246127us :
> execlists_submission_tasklet: execlists_submission_tasklet:886
> GEM_BUG_ON(buf[2 * head + 1] != port->context_id)
>
> the conclusion is that the only place where the ports are reset to zero,
> is from engine->cancel_requests called during i915_gem_set_wedged().
>
> The race is horrible as it results from calling set-wedged on active HW
> (the GPU reset failed) and as such we need to be careful as the HW state
> changes beneath us. Fortunately, it's the same scary conditions as
> affect normal reset, so we can reuse the same machinery to disable state
> tracking as we clobber it.
>
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104945
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Mika Kuoppala <[email protected]>
> Cc: Michel Thierry <[email protected]>
-Chris
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx