Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:
- [High] Use-After-Free in `drm_sched_entity_error()` due to lockless clearing 
of `entity->last_scheduled` in `drm_sched_entity_fini()`.
- [Low] Incomplete preparational refactoring: `drm_sched_entity_kill()` calls 
`spsc_queue_pop()` locklessly.
--

commit 8128ccd719adfa75d48bc49990edb4b623aa9607
Author: Philipp Stanner <[email protected]>

drm/sched: Protect entity->last_scheduled with spinlock

This commit replaces the RCU mechanisms and memory barriers protecting
entity->last_scheduled with a spinlock to improve maintainability. It
also begins guarding spsc_queue_pop() with the lock in preparation for
replacing the queue with a locked list.

> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 4ebb513255edd..2959feb08ac7e 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -202,10 +201,10 @@ int drm_sched_entity_error(struct drm_sched_entity 
> *entity)
>       struct dma_fence *fence;
>       int r;
>  
> -     rcu_read_lock();
> -     fence = rcu_dereference(entity->last_scheduled);
> +     spin_lock(&entity->lock);
> +     fence = entity->last_scheduled;
>       r = fence ? fence->error : 0;
> -     rcu_read_unlock();
> +     spin_unlock(&entity->lock);
>  
>       return r;
>  }

[ ... ]

> @@ -292,8 +291,10 @@ void drm_sched_entity_kill(struct drm_sched_entity 
> *entity)
>       wait_for_completion(&entity->entity_idle);
>  
>       /* The entity is guaranteed to not be used by the scheduler */
> -     prev = rcu_dereference_check(entity->last_scheduled, true);
> +     spin_lock(&entity->lock);
> +     prev = entity->last_scheduled;
>       dma_fence_get(prev);
> +     spin_unlock(&entity->lock);
>       while ((job = drm_sched_entity_queue_pop(entity))) {
>               struct drm_sched_fence *s_fence = job->s_fence;

[Severity: Low]
The commit message notes that spsc_queue_pop() should be guarded with the lock
as a preparational step, but drm_sched_entity_queue_pop() (which wraps
spsc_queue_pop()) is called here in the while loop without holding
entity->lock.

Could this lead to lockless list manipulation when the underlying queue data
structure is later replaced with a locked list?

[ ... ]

> @@ -388,8 +389,12 @@ void drm_sched_entity_fini(struct drm_sched_entity 
> *entity)
>               entity->dependency = NULL;
>       }
>  
> -     dma_fence_put(rcu_dereference_check(entity->last_scheduled, true));
> -     RCU_INIT_POINTER(entity->last_scheduled, NULL);
> +     dma_fence_put(entity->last_scheduled);
> +     /*
> +      * Normally all users should be gone now, but since drm_sched has
> +      * experienced many layering violations in the past, better be safe.
> +      */
> +     WRITE_ONCE(entity->last_scheduled, NULL);
>       drm_sched_entity_stats_put(entity->stats);
>  }

[Severity: High]
Could this lockless pointer clearing lead to a use-after-free in
drm_sched_entity_error()?

Since drm_sched_entity_error() now uses spin_lock() instead of
rcu_read_lock(), a concurrent reader could acquire entity->lock between the
dma_fence_put() and WRITE_ONCE() calls here.

If an RCU grace period completes before the reader dereferences fence->error,
might the reader access a freed fence? Since spinlocks under PREEMPT_RT can
sleep, RCU grace periods aren't implicitly blocked. Would it be safer to
acquire entity->lock around the pointer clearing sequence here?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to