Explain in a comment how and why we treat hung
context like we do.
Cc: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3e10e81..7e0a0de 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2697,6 +2697,27 @@ static bool i915_gem_reset_request(struct
drm_i915_gem_request *request)
/* Read once and return the resolution */
const bool guilty = engine_stalled(request->engine);
+ /* The guilty request will get skipped on a hung engine.
+ *
+ * Users of client default context do not rely on logical
+ * state preserved between batches so it is safe to execute
+ * queued requests following the hang. Non default context
+ * rely on preserved state so skipping a batch loses the
+ * evolution of the state and it needs to be considered corrupted.
+ * Executing more queued batches on top of corrupted state is
+ * risky. But we take the risk by trying to advance through
+ * the queued requests in order to make the client behaviour
+ * more predictable around resets, by not throwing away random
+ * amount of batches it has prepared for execution. Sophisticated
+ * clients can use gem_reset_stats_ioctl and dma fence status
+ * to observe when it loses the context state and should
+ * rebuild accordingly.
+ *
+ * Context ban and ultimately the client ban mechanism are safety
+ * valves if a context state vs client submission ends up resulting
+ * nothing more than a subsequent hangs.
+ */
+
if (guilty) {
i915_gem_context_mark_guilty(request->ctx);
skip_request(request);
--
2.7.4
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx