On Tue, 2025-09-09 at 13:36 +0000, Alice Ryhl wrote:
> When using GPUVM in immediate mode, it is necessary to call
> drm_gpuvm_unlink() from the fence signalling critical path. However,
> unlink may call drm_gpuvm_bo_put(), which causes some challenges:
> 
> 1. drm_gpuvm_bo_put() often requires you to take resv locks, which
> you
>    can't do from the fence signalling critical path.
> 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often
> going
>    to be unsafe to call from the fence signalling critical path.
> 
> To solve these issues, add a deferred version of drm_gpuvm_unlink()
> that
> adds the vm_bo to a deferred cleanup list, and then clean it up
> later.
> 
> The new methods take the GEMs GPUVA lock internally rather than
> letting
> the caller do it because it also needs to perform an operation after
> releasing the mutex again. This is to prevent freeing the GEM while
> holding the mutex (more info as comments in the patch). This means
> that
> the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.
> 
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
>  drivers/gpu/drm/drm_gpuvm.c | 174
> ++++++++++++++++++++++++++++++++++++++++++++
>  include/drm/drm_gpuvm.h     |  26 +++++++
>  2 files changed, 200 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_gpuvm.c
> b/drivers/gpu/drm/drm_gpuvm.c
> index
> 78a1a4f095095e9379bdf604d583f6c8b9863ccb..5aa8b3813019705f70101950af2
> d8fe4e648e9d0 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -876,6 +876,27 @@ __drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm,
> spinlock_t *lock,
>       cond_spin_unlock(lock, !!lock);
>  }
>  
> +/**
> + * drm_gpuvm_bo_is_dead() - check whether this vm_bo is scheduled

NIT: Is zombie a better name than dead?

> for cleanup
> + * @vm_bo: the &drm_gpuvm_bo
> + *
> + * When a vm_bo is scheduled for cleanup using the bo_defer list, it
> is not
> + * immediately removed from the evict and extobj lists if they are
> protected by
> + * the resv lock, as we can't take that lock during run_job() in
> immediate
> + * mode. Therefore, anyone iterating these lists should skip entries
> that are
> + * being destroyed.
> + *
> + * Checking the refcount without incrementing it is okay as long as
> the lock
> + * protecting the evict/extobj list is held for as long as you are
> using the
> + * vm_bo, because even if the refcount hits zero while you are using
> it, freeing
> + * the vm_bo requires taking the list's lock.
> + */
> +static bool
> +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> +{
> +     return !kref_read(&vm_bo->kref);
> +}
> +
>  /**
>   * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
>   * @__vm_bo: the &drm_gpuvm_bo
> @@ -1081,6 +1102,9 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const
> char *name,
>       INIT_LIST_HEAD(&gpuvm->evict.list);
>       spin_lock_init(&gpuvm->evict.lock);
>  
> +     INIT_LIST_HEAD(&gpuvm->bo_defer.list);
> +     spin_lock_init(&gpuvm->bo_defer.lock);
> +

This list appears to exactly follow the pattern a lockless list was
designed for. Saves some space in the vm_bo and gets rid of the
excessive locking. <include/linux/llist.h>

Otherwise LGTM.

/Thomas

Reply via email to