On Wed, 1 Oct 2025 14:04:18 +0200 Boris Brezillon <[email protected]> wrote:
> On Wed, 1 Oct 2025 13:45:36 +0200 > Alice Ryhl <[email protected]> wrote: > > > On Wed, Oct 1, 2025 at 1:27 PM Boris Brezillon > > <[email protected]> wrote: > > > > > > On Wed, 01 Oct 2025 10:41:36 +0000 > > > Alice Ryhl <[email protected]> wrote: > > > > > > > When using GPUVM in immediate mode, it is necessary to call > > > > drm_gpuvm_unlink() from the fence signalling critical path. However, > > > > unlink may call drm_gpuvm_bo_put(), which causes some challenges: > > > > > > > > 1. drm_gpuvm_bo_put() often requires you to take resv locks, which you > > > > can't do from the fence signalling critical path. > > > > 2. drm_gpuvm_bo_put() calls drm_gem_object_put(), which is often going > > > > to be unsafe to call from the fence signalling critical path. > > > > > > > > To solve these issues, add a deferred version of drm_gpuvm_unlink() that > > > > adds the vm_bo to a deferred cleanup list, and then clean it up later. > > > > > > > > The new methods take the GEMs GPUVA lock internally rather than letting > > > > the caller do it because it also needs to perform an operation after > > > > releasing the mutex again. This is to prevent freeing the GEM while > > > > holding the mutex (more info as comments in the patch). This means that > > > > the new methods can only be used with DRM_GPUVM_IMMEDIATE_MODE. > > > > > > > > Reviewed-by: Boris Brezillon <[email protected]> > > > > Signed-off-by: Alice Ryhl <[email protected]> > > > > > > +/* > > > > + * Must be called with GEM mutex held. After releasing GEM mutex, > > > > + * drm_gpuvm_bo_defer_free_unlocked() must be called. > > > > + */ > > > > +static void > > > > +drm_gpuvm_bo_defer_free_locked(struct kref *kref) > > > > +{ > > > > + struct drm_gpuvm_bo *vm_bo = container_of(kref, struct > > > > drm_gpuvm_bo, > > > > + kref); > > > > + struct drm_gpuvm *gpuvm = vm_bo->vm; > > > > + > > > > + if (!drm_gpuvm_resv_protected(gpuvm)) { > > > > + drm_gpuvm_bo_list_del(vm_bo, extobj, true); > > > > + drm_gpuvm_bo_list_del(vm_bo, evict, true); > > > > + } > > > > + > > > > + list_del(&vm_bo->list.entry.gem); > > > > +} > > > > + > > > > +/* > > > > + * GEM mutex must not be held. Called after > > > > drm_gpuvm_bo_defer_free_locked(). > > > > + */ > > > > +static void > > > > +drm_gpuvm_bo_defer_free_unlocked(struct drm_gpuvm_bo *vm_bo) > > > > +{ > > > > + struct drm_gpuvm *gpuvm = vm_bo->vm; > > > > + > > > > + llist_add(&vm_bo->list.entry.bo_defer, &gpuvm->bo_defer); > > > > > > Could we simply move this line to drm_gpuvm_bo_defer_free_locked()? > > > I might be missing something, but I don't really see a reason to > > > have it exposed as a separate operation. > > > > No, if drm_gpuvm_bo_deferred_cleanup() is called in parallel (e.g. > > from a workqueue as we discussed), then this can lead to kfreeing the > > GEM while we hold the mutex. We must not add the vm_bo until it's safe > > to kfree the GEM. See the comment on > > drm_gpuvm_bo_defer_free_unlocked() below. > > Uh, right, I forgot that the lock was embedded in the BO, which we're > releasing a ref on in the cleanup path. Would be good to document the race in the comment saying that gpuva.lock shouldn't be held though.
