On 17/06/2026 09:38, Philipp Stanner wrote:
On Thu, 2026-06-11 at 13:34 +0100, Tvrtko Ursulin wrote:
Due the scheduler locking design, and the inability to always lock both
the entity and the run-queue in the consistent order, a completion exists
which effectively marks the entity as in use from a call path which is not
able to lock it.

When entity is selected from the run job worker, its completion is marked
as non-idle all until the code is sure it will not be dereferencing it any
more, at which point it signals it as idle, releasing the potential
parallel cleanup path.

We can remove the need for this completion by implementing the identical
guarantee by simply flushing the run job work from the cleanup path, after
having removed the entity from the run queue.

We then know that the entity is no longer reachable by the run queue
selection logic, so as soon as any pending work is done the cleanup can
safely proceed. And because we have marked the entity as stopped, we also
know that the entity cannot re-enter the run queue.

Signed-off-by: Tvrtko Ursulin <[email protected]>
Cc: Christian König <[email protected]>
Cc: Danilo Krummrich <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Philipp Stanner <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
"Perfection is achieved, not when there is nothing more to add, but when
  there is nothing left to take away." - Antoine de Saint-Exupéry


Hmm, alright, so the basic trick just seems to be that the workqueue
implementation already can ensure the synchronization which we manually
implemented so far through the completion.


It's a bit more LOC, and doesn't solve a bug. However, I kind of like
the idea because it removes a redundant mechanism. Would be cool to
hear some other opinions, though.

A comment below


Lets see what Intel's CI says about this, not to mention our new AI
overlords...
---
  drivers/gpu/drm/scheduler/sched_entity.c   | 25 +++++++++++++++-------
  drivers/gpu/drm/scheduler/sched_internal.h | 14 ++++++++++--
  drivers/gpu/drm/scheduler/sched_main.c     |  2 --
  drivers/gpu/drm/scheduler/sched_rq.c       | 14 +++++++-----
  include/drm/gpu_scheduler.h                |  9 --------
  5 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index c51101ec70c1..e6f7c2fbefce 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -137,10 +137,6 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
        entity->rq = &sched_list[0]->rq;
        RCU_INIT_POINTER(entity->last_scheduled, NULL);
        RB_CLEAR_NODE(&entity->rb_tree_node);
-       init_completion(&entity->entity_idle);
-
-       /* We start in an idle state. */
-       complete_all(&entity->entity_idle);
  spin_lock_init(&entity->lock);
        spsc_queue_init(&entity->job_queue);
@@ -276,18 +272,24 @@ static void drm_sched_entity_kill_jobs_cb(struct 
dma_fence *f,
   */
  void drm_sched_entity_kill(struct drm_sched_entity *entity)
  {
+       struct drm_gpu_scheduler *sched;
        struct drm_sched_job *job;
        struct dma_fence *prev;
  spin_lock(&entity->lock);
        entity->stopped = true;
-       drm_sched_rq_remove_entity(entity->rq, entity);
+       sched = drm_sched_rq_remove_entity(entity->rq, entity);
        spin_unlock(&entity->lock);
- /* Make sure this entity is not used by the scheduler at the moment */
-       wait_for_completion(&entity->entity_idle);
+       /*
+        * Make sure this entity is not used by the scheduler at the moment.
+        *
+        * Scheduler is guaranteed to be stable after the entity was stopped and
+        * removed from the run-queue.
+        */
+       if (sched)
+               drm_sched_flush_run_work(sched);
- /* The entity is guaranteed to not be used by the scheduler */
        prev = rcu_dereference_check(entity->last_scheduled, true);
        dma_fence_get(prev);
        while ((job = drm_sched_entity_queue_pop(entity))) {
@@ -576,6 +578,13 @@ void drm_sched_entity_select_rq(struct drm_sched_entity 
*entity)
                return;
  spin_lock(&entity->lock);
+
+       if (entity->stopped) {
+               spin_unlock(&entity->lock);
+               return;
+
+       }

Seems unrelated? Why wasn't this needed semantically before?

It solidifies the guarantee drm_sched_entity_kill() expects that the scheduler assigned to an entity cannot change after entity has been stopped.

There we have this sequence:

  spin_lock(&entity->lock);
  entity->stopped = true;
  sched = drm_sched_rq_remove_entity(entity->rq, entity);
  spin_unlock(&entity->lock);

  if (sched)
        drm_sched_flush_run_work(sched);

That is, without that check, in theory, an evil driver could race drm_sched_entity_select_rq() (via drm_sched_job_arm()) and drm_sched_entity_kill(). I am not sure if any driver can actually do that at the moment but it felt sensible to express it in code.

+
        sched = drm_sched_pick_best(entity->sched_list, entity->num_sched_list);
        rq = sched ? &sched->rq : NULL;
        if (rq != entity->rq) {
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h 
b/drivers/gpu/drm/scheduler/sched_internal.h
index 13ecb771d7a2..80dece3be415 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -35,12 +35,22 @@ bool drm_sched_can_queue(struct drm_gpu_scheduler *sched,
                         struct drm_sched_entity *entity);
  void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
+/**
+ * drm_sched_flush_run_work - flush the run-job work

In v1, you'd probably want to document what this function typically
will be used for :)

Well its in the scheduler _internal_ header and I am not sure what to write which will add real value.

"Only used to make sure a stopped entity is not in use by the scheduler workers."

Good enough?

Regards,

Tvrtko

Reply via email to