On Tue, 11 Oct 2011 16:39:09 +0200, Daniel Vetter <[email protected]> wrote: > From: Ben Widawsky <[email protected]> > > This was pulled out of the per ring error handling patch series as it > actually fixes two issues, and bikeshedding appears to be going on > there. > > First, remove setting hangcheck_count when we do notify ring. While it > seems counterintuitive to be setting up a timer to catch hangcheck_count > greater than 0 with hangcheck_count already greater than 0, actually > when we go to check if the GPU is hung we clear that value if the gpu is > still alive . Leaving this is actually harmful as submitting work could > falsely clear the count while the hanghcheck code is checking the count. > I can't think of case where this doesn't just delay the inevitable > reset... but I didn't spend too much time thinking about it. > > Second, for Gen5+ we have more information to be considered when > determining if the GPU is stuck, primarily the media ring (and blitter > ring in gen6). This patch will check all available rings, and also updates > error state with the new information. It theoretically cant fix false > positives, but I haven't actually come across such a case. > > Signed-off-by: Ben Widawsky <[email protected]> > [danvet: remove remnants of a unrelated cleanup patch] > Signed-off-by: Daniel Vetter <[email protected]>
NAK: This failed to detect a hang, leaving my box frozen. I suspect that the value of INSTDONE was fluctuating on the render ring even though we had now requests pending and so could assume that it was idle. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/intel-gfx
