On Tue, 2025-09-09 at 13:36 +0000, Alice Ryhl wrote:
> When using GPUVM in immediate mode, it is necessary to call
> drm_gpuvm_unlink() from the fence signalling critical path. However,
> unlink may call drm_gpuvm_bo_put(), which causes some challenges:
>
> 1. drm_gpuvm_bo_put() often requires you to take resv locks, which
> you
> can't do from the fence signalling critical path.
> 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often
> going
> to be unsafe to call from the fence signalling critical path.
>
> To solve these issues, add a deferred version of drm_gpuvm_unlink()
> that
> adds the vm_bo to a deferred cleanup list, and then clean it up
> later.
>
> The new methods take the GEMs GPUVA lock internally rather than
> letting
> the caller do it because it also needs to perform an operation after
> releasing the mutex again. This is to prevent freeing the GEM while
> holding the mutex (more info as comments in the patch). This means
> that
> the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE.
>
> Signed-off-by: Alice Ryhl <[email protected]>
> ---
> drivers/gpu/drm/drm_gpuvm.c | 174
> ++++++++++++++++++++++++++++++++++++++++++++
> include/drm/drm_gpuvm.h | 26 +++++++
> 2 files changed, 200 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gpuvm.c
> b/drivers/gpu/drm/drm_gpuvm.c
> index
> 78a1a4f095095e9379bdf604d583f6c8b9863ccb..5aa8b3813019705f70101950af2
> d8fe4e648e9d0 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -876,6 +876,27 @@ __drm_gpuvm_bo_list_add(struct drm_gpuvm *gpuvm,
> spinlock_t *lock,
> cond_spin_unlock(lock, !!lock);
> }
>
> +/**
> + * drm_gpuvm_bo_is_dead() - check whether this vm_bo is scheduled
NIT: Is zombie a better name than dead?
> for cleanup
> + * @vm_bo: the &drm_gpuvm_bo
> + *
> + * When a vm_bo is scheduled for cleanup using the bo_defer list, it
> is not
> + * immediately removed from the evict and extobj lists if they are
> protected by
> + * the resv lock, as we can't take that lock during run_job() in
> immediate
> + * mode. Therefore, anyone iterating these lists should skip entries
> that are
> + * being destroyed.
> + *
> + * Checking the refcount without incrementing it is okay as long as
> the lock
> + * protecting the evict/extobj list is held for as long as you are
> using the
> + * vm_bo, because even if the refcount hits zero while you are using
> it, freeing
> + * the vm_bo requires taking the list's lock.
> + */
> +static bool
> +drm_gpuvm_bo_is_dead(struct drm_gpuvm_bo *vm_bo)
> +{
> + return !kref_read(&vm_bo->kref);
> +}
> +
> /**
> * drm_gpuvm_bo_list_add() - insert a vm_bo into the given list
> * @__vm_bo: the &drm_gpuvm_bo
> @@ -1081,6 +1102,9 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const
> char *name,
> INIT_LIST_HEAD(&gpuvm->evict.list);
> spin_lock_init(&gpuvm->evict.lock);
>
> + INIT_LIST_HEAD(&gpuvm->bo_defer.list);
> + spin_lock_init(&gpuvm->bo_defer.lock);
> +
This list appears to exactly follow the pattern a lockless list was
designed for. Saves some space in the vm_bo and gets rid of the
excessive locking. <include/linux/llist.h>
Otherwise LGTM.
/Thomas