On Tue, 2026-06-23 at 15:05 +0100, Tvrtko Ursulin wrote:
> > > There we have this sequence:
> > > 
> > >     spin_lock(&entity->lock);
> > >     entity->stopped = true;
> > >     sched = drm_sched_rq_remove_entity(entity->rq, entity);
> > >     spin_unlock(&entity->lock);
> > > 
> > >     if (sched)
> > >           drm_sched_flush_run_work(sched);
> > > 
> > > That is, without that check, in theory, an evil driver could race
> > > drm_sched_entity_select_rq() (via drm_sched_job_arm()) and
> > > drm_sched_entity_kill(). I am not sure if any driver can actually do
> > > that at the moment but it felt sensible to express it in code.
> > 
> > The devil is in the "at the moment".
> > 
> > AFAICS and as your explanation sounds, this is an existing problem that
> > is unrelated to the Flush RFC. So I suppose that this should be a
> > separate patch (independent from this one) which precisely focusses on
> > this robustness work.
> 
> Catch is, and why I thought it is justified to do this it in this patch, 
> is because before completion was in the entity so whats happening with 
> entity->rq->sched after entity is stopped was irrelevant. With this 
> patch is it relevant on paper so I considered it prudent to express that 
> in the code as well as the comment.

I'm not sure if I can fully follow, but it seems that the stopped
boolean + spinlock now protects the scheduler pointer?


IDK, I don't feel comfortable right now.. New docu says

drm_sched_rq_add_entity(struct drm_sched_entity *entity)
  * @entity: scheduler entity
  *
  * Removes a scheduler entity from the run queue.
+ *
+ * Return: DRM scheduler selected to handle this entity or NULL if entity has
+ * already been removed.
  */


Isn't a removed entity by definition stopped since drm_sched can't
access it anymore? Or do we have race there, too?

If so, then stopped and entity->rq->sched == NULL represent the same
state. IOW, the locked bool must be redundant with something.

Right?

> 
> Acceptable or unacceptable?

In any case such changes should, wherever possible, be a separate
patch. For the fact alone that the distinct commit message and
especially the position / order of the patch within the series helps
the reviewer greatly in understanding what that particular change
enables and thus why it's necessary.


P.

Reply via email to