panthor: Relax check in panthor_sched_pre_reset()

Steven Price Mon, 17 Nov 2025 01:49:38 -0800

On 12/11/2025 12:47, Boris Brezillon wrote:
> On Fri, 7 Nov 2025 16:40:53 +0000
> Steven Price <[email protected]> wrote:
> 
>> On 31/10/2025 15:48, Boris Brezillon wrote:
>>> A group can become runnable even after reset.in_progress has
>>> been set to true and panthor_sched_suspend() has been called,
>>> because the drm_sched queues are still running at that point,
>>> and ::run_job() might call group_schedule_locked() which moves
>>> the group to the runnable list. And that's fine, because we're
>>> moving those groups to the stopped list anyway when we call
>>> panthor_group_stop(), so just drop the misleading WARN_ON().  
>>
>> If we've got another thread mutating the runnable list between
>> panthor_sched_suspend() and list_for_each_entry_safe(), doesn't that
>> make the list iterator unsafe? (_safe only protects against deleting the
>> current item, not against concurrent access).
> 
> I'm not too sure actually. There's an
> atomic_read(&sched->reset.in_progress) to check if we're about to reset
> in group_schedule_locked() and cancel the insertion into the runnable
> list in that case, meaning we're sure nothing new will be inserted after
> we've set the in_progress=true in panthor_sched_pre_reset().


I was mostly going on your commit message:

> A group can become runnable even after reset.in_progress has
> been set to true and panthor_sched_suspend() has been called

if that is indeed happening then we have a problem (and removing the
WARN_ON is just papering over it). I haven't actually followed through
the logic.

>>
>> It feels to me like we should be holding the sched mutex - at least
>> while iterating. I agree the WARN_ON is unnecessary, and will need
>> removing if we simply guard the iteration - the alternative is to
>> recolour panthor_sched_suspend() to assume the lock is held (and take
>> the lock in panthor_sched_pre_reset), but I suspect that's a more ugly
>> change.
> 
> I'd rather ensure that nothing new is inserted in the runnable/idle
> lists after sched->reset.in_progress is set to true. Note that
> sched->reset.in_progress is set to true with the sched lock held,
> meaning any path modifying the sched lists (must be done with the sched
> lock held) should complete before we set this to true. As long as those
> paths also skip the list insertion, or, for things happening in a work
> context (thinking of the tick work here), as long as the work is not
> rescheduled until we get a chance to disable this work, we should be
> good, no?

Yes that design can work. But atomics can be difficult to reason about,
so unless there's a good reason I think we'd generally be better
sticking with (simple) locks on the slow paths, then we get the benefits
of lockdep etc checking we haven't messed up.

Thanks,
Steve

Re: [PATCH v1 4/4] drm/panthor: Relax check in panthor_sched_pre_reset()

Reply via email to