Re: [RFC] drm/sched: Replace completion with a flush

Philipp Stanner Wed, 17 Jun 2026 01:38:19 -0700

On Thu, 2026-06-11 at 13:34 +0100, Tvrtko Ursulin wrote:
> Due the scheduler locking design, and the inability to always lock both
> the entity and the run-queue in the consistent order, a completion exists
> which effectively marks the entity as in use from a call path which is not
> able to lock it.
> 
> When entity is selected from the run job worker, its completion is marked
> as non-idle all until the code is sure it will not be dereferencing it any
> more, at which point it signals it as idle, releasing the potential
> parallel cleanup path.
> 
> We can remove the need for this completion by implementing the identical
> guarantee by simply flushing the run job work from the cleanup path, after
> having removed the entity from the run queue.
> 
> We then know that the entity is no longer reachable by the run queue
> selection logic, so as soon as any pending work is done the cleanup can
> safely proceed. And because we have marked the entity as stopped, we also
> know that the entity cannot re-enter the run queue.
> 
> Signed-off-by: Tvrtko Ursulin <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Danilo Krummrich <[email protected]>
> Cc: Matthew Brost <[email protected]>
> Cc: Philipp Stanner <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> "Perfection is achieved, not when there is nothing more to add, but when
>  there is nothing left to take away." - Antoine de Saint-Exupéry



Hmm, alright, so the basic trick just seems to be that the workqueue
implementation already can ensure the synchronization which we manually
implemented so far through the completion.


It's a bit more LOC, and doesn't solve a bug. However, I kind of like
the idea because it removes a redundant mechanism. Would be cool to
hear some other opinions, though.

A comment below

> 
> Lets see what Intel's CI says about this, not to mention our new AI
> overlords...
> ---
>  drivers/gpu/drm/scheduler/sched_entity.c   | 25 +++++++++++++++-------
>  drivers/gpu/drm/scheduler/sched_internal.h | 14 ++++++++++--
>  drivers/gpu/drm/scheduler/sched_main.c     |  2 --
>  drivers/gpu/drm/scheduler/sched_rq.c       | 14 +++++++-----
>  include/drm/gpu_scheduler.h                |  9 --------
>  5 files changed, 38 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index c51101ec70c1..e6f7c2fbefce 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -137,10 +137,6 @@ int drm_sched_entity_init(struct drm_sched_entity 
> *entity,
>       entity->rq = &sched_list[0]->rq;
>       RCU_INIT_POINTER(entity->last_scheduled, NULL);
>       RB_CLEAR_NODE(&entity->rb_tree_node);
> -     init_completion(&entity->entity_idle);
> -
> -     /* We start in an idle state. */
> -     complete_all(&entity->entity_idle);
>  
>       spin_lock_init(&entity->lock);
>       spsc_queue_init(&entity->job_queue);
> @@ -276,18 +272,24 @@ static void drm_sched_entity_kill_jobs_cb(struct 
> dma_fence *f,
>   */
>  void drm_sched_entity_kill(struct drm_sched_entity *entity)
>  {
> +     struct drm_gpu_scheduler *sched;
>       struct drm_sched_job *job;
>       struct dma_fence *prev;
>  
>       spin_lock(&entity->lock);
>       entity->stopped = true;
> -     drm_sched_rq_remove_entity(entity->rq, entity);
> +     sched = drm_sched_rq_remove_entity(entity->rq, entity);
>       spin_unlock(&entity->lock);
>  
> -     /* Make sure this entity is not used by the scheduler at the moment */
> -     wait_for_completion(&entity->entity_idle);
> +     /*
> +      * Make sure this entity is not used by the scheduler at the moment.
> +      *
> +      * Scheduler is guaranteed to be stable after the entity was stopped and
> +      * removed from the run-queue.
> +      */
> +     if (sched)
> +             drm_sched_flush_run_work(sched);
>  
> -     /* The entity is guaranteed to not be used by the scheduler */
>       prev = rcu_dereference_check(entity->last_scheduled, true);
>       dma_fence_get(prev);
>       while ((job = drm_sched_entity_queue_pop(entity))) {
> @@ -576,6 +578,13 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
> *entity)
>               return;
>  
>       spin_lock(&entity->lock);
> +
> +     if (entity->stopped) {
> +             spin_unlock(&entity->lock);
> +             return;
> +
> +     }

Seems unrelated? Why wasn't this needed semantically before?

> +
>       sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
>       rq = sched ? &sched->rq : NULL;
>       if (rq != entity->rq) {
> diff --git a/drivers/gpu/drm/scheduler/sched_internal.h 
> b/drivers/gpu/drm/scheduler/sched_internal.h
> index 13ecb771d7a2..80dece3be415 100644
> --- a/drivers/gpu/drm/scheduler/sched_internal.h
> +++ b/drivers/gpu/drm/scheduler/sched_internal.h
> @@ -35,12 +35,22 @@ bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
>                        struct drm_sched_entity *entity);
>  void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
>  
> +/**
> + * drm_sched_flush_run_work - flush the run-job work

In v1, you'd probably want to document what this function typically
will be used for :)


Greetings,
P.

Re: [RFC] drm/sched: Replace completion with a flush

Reply via email to