We decay hangcheck score, instead of setting it to zero,
when seqno moves forward. This was added as mechanism to
detect batches full of invalid waits. But with multiple runs of
very intensive batches (shader tests), the score accumulates
even without wait/kick pairs only by engine being active inside
shader loops multiple times in succession.

Prevent this mechanism to falsely trigger on loops by
decaying faster than we accumulate during active looping.

Cc: Chris Wilson <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
---
 drivers/gpu/drm/i915/i915_irq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6ed6571..3507269 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3009,6 +3009,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
 #define BUSY 1
 #define KICK 5
 #define HUNG 20
+#define BUSY_DECAY (2*BUSY)
 
        if (!i915.enable_hangcheck)
                return;
@@ -3084,8 +3085,8 @@ static void i915_hangcheck_elapsed(struct work_struct 
*work)
                        /* Gradually reduce the count so that we catch DoS
                         * attempts across multiple batches.
                         */
-                       if (ring->hangcheck.score > 0)
-                               ring->hangcheck.score--;
+                       if (ring->hangcheck.score > BUSY_DECAY)
+                               ring->hangcheck.score -= BUSY_DECAY;
 
                        ring->hangcheck.acthd = ring->hangcheck.max_acthd = 0;
                }
-- 
2.5.0

_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to