On Wed, 2023-08-02 at 16:35 -0700, Teres Alexis, Alan Previn wrote:
> If we are at the end of suspend or very early in resume
> its possible an async fence signal could lead us to the
> execution of the context destruction worker (after the
> prior worker flush).
>
alan:snip
>
> static void __guc_context_destroy(struct intel_context *ce)
> @@ -3270,7 +3287,20 @@ static void deregister_destroyed_contexts(struct
> intel_guc *guc)
> if (!ce)
> break;
>
> - guc_lrc_desc_unpin(ce);
> + if (guc_lrc_desc_unpin(ce)) {
> + /*
> + * This means GuC's CT link severed mid-way which only
> happens
> + * in suspend-resume corner cases. In this case, put the
> + * context back into the destroyed_contexts list which
> will
> + * get picked up on the next context deregistration
> event or
> + * purged in a GuC sanitization event
> (reset/unload/wedged/...).
> + */
> + spin_lock_irqsave(&guc->submission_state.lock, flags);
> + list_add_tail(&ce->destroyed_link,
> +
> &guc->submission_state.destroyed_contexts);
alan: i completely missed the fact this new code is sitting within a
while (!list_empty(&guc->submission_state.submission_state.destroyed_contexts)
block
so putting it back will cause it to while loop forever.
will fix and rerev.
> + spin_unlock_irqrestore(&guc->submission_state.lock,
> flags);
> + }
> +
> }
> }
>