On Wed, 2025-10-15 at 09:40 +0100, Tvrtko Ursulin wrote:
> When adding dependencies with drm_sched_job_add_dependency(), that
> function consumes the fence reference both on success and failure, so in
> the latter case the dma_fence_put() on the error path (xarray failed to
> expand) is a double free.
>
> Interestingly this bug appears to have been present ever since
> commit ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code
> back then looked like this:
>
> drm_sched_job_add_implicit_dependencies():
> ...
> for (i = 0; i < fence_count; i++) {
> ret = drm_sched_job_add_dependency(job, fences[i]);
> if (ret)
> break;
> }
>
> for (; i < fence_count; i++)
> dma_fence_put(fences[i]);
>
> Which means for the failing 'i' the dma_fence_put was already a double
> free. Possibly there were no users at that time, or the test cases were
> insufficient to hit it.
>
> The bug was then only noticed and fixed after
> commit 9c2ba265352a ("drm/scheduler: use new iterator in
> drm_sched_job_add_implicit_dependencies v2")
> landed, with its fixup of
> commit 4eaf02d6076c ("drm/scheduler: fix
> drm_sched_job_add_implicit_dependencies").
>
> At that point it was a slightly different flavour of a double free, which
> commit 963d0b356935 ("drm/scheduler: fix
> drm_sched_job_add_implicit_dependencies harder")
> noticed and attempted to fix.
>
> But it only moved the double free from happening inside the
> drm_sched_job_add_dependency(), when releasing the reference not yet
> obtained, to the caller, when releasing the reference already released by
> the former in the failure case.
>
> As such it is not easy to identify the right target for the fixes tag so
> lets keep it simple and just continue the chain.
>
> While fixing we also improve the comment and explain the reason for taking
> the reference and not dropping it.
>
> Signed-off-by: Tvrtko Ursulin <[email protected]>
> Fixes: 963d0b356935 ("drm/scheduler: fix
> drm_sched_job_add_implicit_dependencies harder")
> Reported-by: Dan Carpenter <[email protected]>
> Closes: https://lore.kernel.org/dri-devel/[email protected]/
Applied to drm-misc-fixes
Thx
P.
> Cc: Christian König <[email protected]>
> Cc: Rob Clark <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Matthew Brost <[email protected]>
> Cc: Danilo Krummrich <[email protected]>
> Cc: Philipp Stanner <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: [email protected]
> Cc: <[email protected]> # v5.16+
> ---
> v2:
> * Re-arrange commit text so discussion around sentences starting with
> capital letters in all cases can be avoided.
> * Keep double return for now.
> * Improved comment instead of dropping it.
>
> v3:
> * Commit SHA formatting in the commit message.
> ---
> drivers/gpu/drm/scheduler/sched_main.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 46119aacb809..c39f0245e3a9 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -965,13 +965,14 @@ int drm_sched_job_add_resv_dependencies(struct
> drm_sched_job *job,
> dma_resv_assert_held(resv);
>
> dma_resv_for_each_fence(&cursor, resv, usage, fence) {
> - /* Make sure to grab an additional ref on the added fence */
> - dma_fence_get(fence);
> - ret = drm_sched_job_add_dependency(job, fence);
> - if (ret) {
> - dma_fence_put(fence);
> + /*
> + * As drm_sched_job_add_dependency always consumes the fence
> + * reference (even when it fails), and dma_resv_for_each_fence
> + * is not obtaining one, we need to grab one before calling.
> + */
> + ret = drm_sched_job_add_dependency(job, dma_fence_get(fence));
> + if (ret)
> return ret;
> - }
> }
> return 0;
> }