last_scheduled with spinlock

Philipp Stanner Fri, 03 Jul 2026 07:47:30 -0700

On Fri, 2026-07-03 at 12:27 +0100, Tvrtko Ursulin wrote:
> 
> On 01/07/2026 09:59, Philipp Stanner wrote:
> > The entity->last_scheduled field has always been set and read with
> > special RCU functions in addition to memory barriers.
> > 
> > This was added in
> > 
> > commit 70102d77ff22 ("drm/scheduler: add drm_sched_entity_error and use rcu 
> > for last_scheduled")
> > 
> > however, no proper justification for that mechanism was provided. There
> > seems to be no obvious reason, since the entity lock is available and
> > taken at all places that evaluate the last_scheduled field. The only
> > exception is drm_sched_entity_error(), which is not performance critical
> > in any way.
> > 
> > Improve robustness, readability and maintainability by replacing RCU and
> > barriers with the lock.
> 
> First thing, and regardless of other strands of discussion, I think it 
> should be squashed with 3/5 instead of that one undoing the introduction 
> of lock-unlock-lock-unlock.


I agree that there should not be a do-undo pattern, but I don't want to
squash that, it's quite a distinctive action. One patch adds locks, the
other moves them.

But what I can do is move that patch before №1 here so that it becomes
understandable as a preparational commit.

> 
> For what the main topic is concerned, I really like the removal of all 
> the rcu_dereference_check(, true) lines and the memory barriers.
> 
> But I also think the commit message should explain better what code 
> paths are now taking an extra lock - under which circumstances is the
> lock now taken for all scheduler users, and which amdgpu paths use 
> drm_sched_entity_error() a lot so could be affected. I doubt it creates 
> a measurable performance impact but it needs to be explained.

I think it can detail which functions will now be locked; but
mentioning the users would be overkill and is uncommon for API reworks.

> 
> I am also happy to give it a spin on the Steam Deck to see if I can 
> observe anything.

Could be interesting.

> 
> > Signed-off-by: Philipp Stanner <[email protected]>
> > ---
> >   drivers/gpu/drm/scheduler/sched_entity.c | 50 ++++++++++--------------
> >   include/drm/gpu_scheduler.h              |  9 ++---
> >   2 files changed, 25 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> > b/drivers/gpu/drm/scheduler/sched_entity.c
> > index c51101ec70c1..91aec20611ad 100644
> > --- a/drivers/gpu/drm/scheduler/sched_entity.c
> > +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> > @@ -135,7 +135,6 @@ int drm_sched_entity_init(struct drm_sched_entity 
> > *entity,
> >     entity->num_sched_list = num_sched_list;
> >     entity->sched_list = num_sched_list > 1 ? sched_list : NULL;
> >     entity->rq = &sched_list[0]->rq;
> > -   RCU_INIT_POINTER(entity->last_scheduled, NULL);
> >     RB_CLEAR_NODE(&entity->rb_tree_node);
> >     init_completion(&entity->entity_idle);
> >   
> > @@ -201,10 +200,10 @@ int drm_sched_entity_error(struct drm_sched_entity 
> > *entity)
> >     struct dma_fence *fence;
> >     int r;
> >   
> > -   rcu_read_lock();
> > -   fence = rcu_dereference(entity->last_scheduled);
> > +   spin_lock(&entity->lock);
> > +   fence = entity->last_scheduled;
> >     r = fence ? fence->error : 0;
> > -   rcu_read_unlock();
> > +   spin_unlock(&entity->lock);
> >   
> >     return r;
> >   }
> > @@ -287,9 +286,10 @@ void drm_sched_entity_kill(struct drm_sched_entity 
> > *entity)
> >     /* Make sure this entity is not used by the scheduler at the moment */
> >     wait_for_completion(&entity->entity_idle);
> >   
> > -   /* The entity is guaranteed to not be used by the scheduler */
> > -   prev = rcu_dereference_check(entity->last_scheduled, true);
> > +   spin_lock(&entity->lock);
> > +   prev = entity->last_scheduled;
> >     dma_fence_get(prev);
> > +   spin_unlock(&entity->lock);
> >     while ((job = drm_sched_entity_queue_pop(entity))) {
> >             struct drm_sched_fence *s_fence = job->s_fence;
> >   
> > @@ -381,8 +381,7 @@ void drm_sched_entity_fini(struct drm_sched_entity 
> > *entity)
> >             entity->dependency = NULL;
> >     }
> >   
> > -   dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
> > -   RCU_INIT_POINTER(entity->last_scheduled, NULL);
> > +   dma_fence_put(entity->last_scheduled);
> >     drm_sched_entity_stats_put(entity->stats);
> >   }
> >   EXPORT_SYMBOL(drm_sched_entity_fini);
> > @@ -507,6 +506,10 @@ drm_sched_job_dependency(struct drm_sched_job *job,
> >   
> >   struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity 
> > *entity)
> >   {
> > +   /* Helper to avoid dropping the reference while the entity lock is held,
> > +    * just to have some more robustness.
> > +    */
> 
> I don't get this comment. Neither the placement or the content.

It explains the purpose of the variable 'prev_last_scheduled', which
exists so that a reference does not drop under lock protection.


P.

Re: [PATCH 1/5] drm/sched: Protect entity->last_scheduled with spinlock

Reply via email to