Hardware needs cacheline count for indirect context size.
Count of zero means that the feature is disabled.
If we only divide size with cacheline bytes, we get
one cacheline short of execution.

Divide by rounding up to a cacheline size so that
hardware executes everything intended.

Bspec: 11739
Fixes: 17ee950df38b ("drm/i915/gen8: Add infrastructure to initialize WA batch 
buffers")
Cc: Chris Wilson <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6fbad5e2343f..acbb36ad17ff 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4739,7 +4739,8 @@ static void init_wa_bb_reg_state(u32 * const regs,
 
                regs[pos_bb_per_ctx + 2] =
                        (ggtt_offset + wa_ctx->indirect_ctx.offset) |
-                       (wa_ctx->indirect_ctx.size / CACHELINE_BYTES);
+                       DIV_ROUND_UP(wa_ctx->indirect_ctx.size,
+                                    CACHELINE_BYTES);
 
                regs[pos_bb_per_ctx + 4] =
                        intel_lr_indirect_ctx_offset(engine) << 6;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to