Hi, On 20/05/2026 20:44, Nikolay Mikhaylov wrote:
GGTT PTEs are written through GSM using MMIO writes. On Lunar Lake systems affected by the referenced issue, hangs have been observed around GGTT update paths while the GT may be entering RC6 under GuC control.The GGTT modify paths currently rely on xe_pm_runtime_get_noresume() for power management protection. That prevents the device from entering D3, but does not keep the GT out of RC6 while the device is otherwise runtime PM active. Hold GT FORCEWAKE across the observed GGTT PTE write batches: - xe_ggtt_insert_node_transform() and __xe_ggtt_insert_bo_at(), covering the display framebuffer pin path observed as the primary trigger on LNL/Wayland systems - xe_ggtt_clear(), covering the GGTT node removal/unpin path This keeps the change limited to the paths where the hang has been observed. The exact code shape submitted here has been tested by multiple LNL users, including several weeks of uptime without reproducing the hang. Boot-time xe_ggtt_initial_clear() is also covered by the xe_ggtt_clear() wrap. That is incidental and not the primary load-bearing path for the reported issue. The insert-path FORCEWAKE wrapping was originally proposed by Márton Vigh (@mrtnvgh): https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7513#note_3418761 Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7513 Tested-by: Nikolay Mikhaylov <[email protected]> Signed-off-by: Nikolay Mikhaylov <[email protected]>
I went through the specs, WAs and the only thing I could come up with was a potential miss in the implementation of WA 22019338487_display, which is relevant for LNL, and could be related since this is in the fb pin path, but it's still a long shot. The WA seems to want the driver to limit host side (over the BAR) writes to stolen memory. The GT side of the WA 22019338487, looks to try to "rate limit" GGTT host side writes, since GGTT also lives in stolen, and we ofc can't outright disable it (see ggtt_update_access_counter).
Patch is here: https://lore.kernel.org/intel-xe/[email protected]/
I don't know if this will actually solve your particular issue, but regardless I think we should still land that anyway.
For your patch, assuming the other patch doesn't help, I think maybe tweak it such that it is limited only to LNL, with FIXME that we still need to figure out the proper root cause, but in the meantime this has been shown to resolve the hang?
--- drivers/gpu/drm/xe/xe_ggtt.c | 39 ++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c index a351c578b170..c048bad70ebf 100644 --- a/drivers/gpu/drm/xe/xe_ggtt.c +++ b/drivers/gpu/drm/xe/xe_ggtt.c @@ -20,6 +20,8 @@ #include "regs/xe_regs.h" #include "xe_assert.h" #include "xe_bo.h" +#include "xe_device.h" +#include "xe_force_wake.h" #include "xe_gt_printk.h" #include "xe_gt_types.h" #include "xe_map.h" @@ -272,9 +274,18 @@ static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size) else scratch_pte = 0;- while (start < end) {- ggtt->pt_ops->ggtt_set_pte(ggtt, start, scratch_pte); - start += XE_PAGE_SIZE; + /* + * GSM (mapped at tile->mmio.regs + SZ_8M) is not in an always-on + * power domain. Hold FORCEWAKE for the PTE write batch to keep + * the GT awake; on LNL GuC autonomously enters RC6 via + * GUCRC_FIRMWARE_CONTROL and writeq() to GSM hangs if the GT + * is asleep. + */ + xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) { + while (start < end) { + ggtt->pt_ops->ggtt_set_pte(ggtt, start, scratch_pte); + start += XE_PAGE_SIZE; + } } }@@ -769,10 +780,19 @@ struct xe_ggtt_node *xe_ggtt_insert_node_transform(struct xe_ggtt *ggtt,if (ret) goto err_unlock;- if (transform)- transform(ggtt, node, pte_flags, ggtt->pt_ops->ggtt_set_pte, arg); - else - xe_ggtt_map_bo(ggtt, node, bo, pte_flags); + /* + * Hold FORCEWAKE for the PTE write batch. xe_pm_runtime_get_noresume() + * upstack only prevents D3, not RC6: GuC may have placed the GT into + * RC6 autonomously (GUCRC_FIRMWARE_CONTROL on LNL), and writeq() to + * GSM hangs if the GT is asleep. Triggers most often from the display + * framebuffer pin path on LNL/Wayland. + */ + xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) { + if (transform) + transform(ggtt, node, pte_flags, ggtt->pt_ops->ggtt_set_pte, arg); + else + xe_ggtt_map_bo(ggtt, node, bo, pte_flags); + }mutex_unlock(&ggtt->lock);return node; @@ -844,7 +864,10 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo, u16 pat_index = xe_cache_pat_idx(tile_to_xe(ggtt->tile), cache_mode); u64 pte = ggtt->pt_ops->pte_encode_flags(bo, pat_index);- xe_ggtt_map_bo(ggtt, bo->ggtt_node[tile_id], bo, pte);+ /* See xe_ggtt_insert_node_transform()/xe_ggtt_clear() */ + xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) { + xe_ggtt_map_bo(ggtt, bo->ggtt_node[tile_id], bo, pte); + } } mutex_unlock(&ggtt->lock);
