i915: hangcheck robustification

Chris Wilson Wed, 19 Oct 2011 04:32:51 -0700

On Tue, 11 Oct 2011 16:39:09 +0200, Daniel Vetter <[email protected]> 
wrote:
> From: Ben Widawsky <[email protected]>
> 
> This was pulled out of the per ring error handling patch series as it
> actually fixes two issues, and bikeshedding appears to be going on
> there.
> 
> First, remove setting hangcheck_count when we do notify ring. While it
> seems counterintuitive to be setting up a timer to catch hangcheck_count
> greater than 0 with hangcheck_count already greater than 0, actually
> when we go to check if the GPU is hung we clear that value if the gpu is
> still alive . Leaving this is actually harmful as submitting work could
> falsely clear the count while the hanghcheck code is checking the count.
> I can't think of case where this doesn't just delay the inevitable
> reset... but I didn't spend too much time thinking about it.
> 
> Second, for Gen5+ we have more information to be considered when
> determining if the GPU is stuck, primarily the media ring (and blitter
> ring in gen6). This patch will check all available rings, and also updates
> error state with the new information. It theoretically cant fix false
> positives, but I haven't actually come across such a case.
> 
> Signed-off-by: Ben Widawsky <[email protected]>
> [danvet: remove remnants of a unrelated cleanup patch]
> Signed-off-by: Daniel Vetter <[email protected]>


NAK: This failed to detect a hang, leaving my box frozen. I suspect that
the value of INSTDONE was fluctuating on the render ring even though we
had now requests pending and so could assume that it was idle.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] drm/i915: hangcheck robustification

Reply via email to