-----Original Message-----
From: Harrison, John C <[email protected]>
Sent: Thursday, October 12, 2023 6:08 PM
To: Cavitt, Jonathan <[email protected]>;
[email protected]
Cc: Gupta, saurabhg <[email protected]>; [email protected];
Iddamsetty, Aravind <[email protected]>; Yang, Fei
<[email protected]>; Shyti, Andi <[email protected]>; Das, Nirmoy
<[email protected]>; Krzysztofik, Janusz <[email protected]>;
Roper, Matthew D <[email protected]>; [email protected];
[email protected]
Subject: Re: [PATCH v13 4/7] drm/i915: No TLB invalidation on suspended GT
>
> On 10/12/2023 15:38, Jonathan Cavitt wrote:
> > In case of GT is suspended, don't allow submission of new TLB invalidation
> > request and cancel all pending requests. The TLB entries will be
> > invalidated either during GuC reload or on system resume.
> >
> > Signed-off-by: Fei Yang <[email protected]>
> > Signed-off-by: Jonathan Cavitt <[email protected]>
> > CC: John Harrison <[email protected]>
> > Reviewed-by: Andi Shyti <[email protected]>
> > Acked-by: Tvrtko Ursulin <[email protected]>
> > Acked-by: Nirmoy Das <[email protected]>
> > ---
> > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 1 +
> > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
> > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 7 ++++++
> > 3 files changed, 22 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 0949628d69f8b..2b6dfe62c8f2a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -537,4 +537,5 @@ int intel_guc_invalidate_tlb_engines(struct intel_guc
> > *guc);
> > int intel_guc_invalidate_tlb_guc(struct intel_guc *guc);
> > int intel_guc_tlb_invalidation_done(struct intel_guc *guc,
> > const u32 *payload, u32 len);
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc);
> > #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 1377398afcdfa..3a0d20064878a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1796,13 +1796,24 @@ static void __guc_reset_context(struct
> > intel_context *ce, intel_engine_mask_t st
> > intel_context_put(parent);
> > }
> >
> > -void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t
> > stalled)
> > +void wake_up_all_tlb_invalidate(struct intel_guc *guc)
> > {
> > struct intel_guc_tlb_wait *wait;
> > + unsigned long i;
> > +
> > + if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> Why the change from 'if(!is_available) return' to 'if(HAS_) {doStuff}'?
I feel like this question has two parts, so I'll answer them separately:
1. Why HAS_GUC_TLB_INVALIDATION and not intel_guc_tlb_invalidation_is_available?
Wake_up_all_tlb_invalidate is called during the suspend/resume path,
specifically in the
middle of suspend. It's required for it to be called here to clean up any
invalidations left
in the queue during the suspend/resume phase because they are no longer valid
requests.
However, the suspend/resume phase also resets GuC, so intel_guc_is_ready
returns false.
In short, using intel_guc_invalidation_is_available was causing us to skip this
code section
incorrectly, resulting in spurious GuC TLB invalidation timeout errors during
gt reset.
2. Why use a positive check to perform and not a negative check to skip?
In patch 3, wake_up_all_tlb_invalidate was originally called universally on all
platforms
during intel_guc_submission_reset, which is incorrect and not how was
reimplemented here.
I discovered this was the case and retroactively corrected it, as seen below.
Because of how intel_guc_submission_reset is structured, a negative check to
skip wouldn't
make much sense there, so I used a positive check to perform instead. This is
a holdover from
that implementation, and was kept to maintain consistency between patches 3 and
4. It's
probably not as big of a deal as I'm imagining, but I think it would be awkward
if the initial
implementation in intel_guc_submission_reset and the reimplementation in
wake_up_all_tlb_invalidate weren't superficially the same, even if they were
functionally
equivalent otherwise.
-Jonathan Cavitt
>
> John.
>
> > + xa_lock_irq(&guc->tlb_lookup);
> > + xa_for_each(&guc->tlb_lookup, i, wait)
> > + wake_up(&wait->wq);
> > + xa_unlock_irq(&guc->tlb_lookup);
> > + }
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t
> > stalled)
> > +{
> > struct intel_context *ce;
> > unsigned long index;
> > unsigned long flags;
> > - unsigned long i;
> >
> > if (unlikely(!guc_submission_initialized(guc))) {
> > /* Reset called during driver load? GuC not yet initialised! */
> > @@ -1833,12 +1844,7 @@ void intel_guc_submission_reset(struct intel_guc
> > *guc, intel_engine_mask_t stall
> > * The full GT reset will have cleared the TLB caches and flushed the
> > * G2H message queue; we can release all the blocked waiters.
> > */
> > - if (HAS_GUC_TLB_INVALIDATION(guc_to_gt(guc)->i915)) {
> > - xa_lock_irq(&guc->tlb_lookup);
> > - xa_for_each(&guc->tlb_lookup, i, wait)
> > - wake_up(&wait->wq);
> > - xa_unlock_irq(&guc->tlb_lookup);
> > - }
> > + wake_up_all_tlb_invalidate(guc);
> > }
> >
> > static void guc_cancel_context_requests(struct intel_context *ce)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 98b103375b7ab..27f6561dd7319 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -688,6 +688,8 @@ void intel_uc_suspend(struct intel_uc *uc)
> > /* flush the GSC worker */
> > intel_gsc_uc_flush_work(&uc->gsc);
> >
> > + wake_up_all_tlb_invalidate(guc);
> > +
> > if (!intel_guc_is_ready(guc)) {
> > guc->interrupts.enabled = false;
> > return;
> > @@ -736,6 +738,11 @@ static int __uc_resume(struct intel_uc *uc, bool
> > enable_communication)
> >
> > intel_gsc_uc_resume(&uc->gsc);
> >
> > + if (intel_guc_tlb_invalidation_is_available(guc)) {
> > + intel_guc_invalidate_tlb_engines(guc);
> > + intel_guc_invalidate_tlb_guc(guc);
> > + }
> > +
> > return 0;
> > }
> >
>
>