On Mon, Mar 02, 2026 at 10:22:13PM +0100, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2026-03-02 at 10:57 -0800, Matthew Brost wrote:
> 
> Thanks for reviewing,
> 
> > On Mon, Mar 02, 2026 at 05:32:46PM +0100, Thomas Hellström wrote:
> > > In multi-GPU scenarios, asynchronous GPU job latency is a
> > > bottleneck if
> > > each notifier waits for its own GPU before returning. The two-pass
> > > mmu_interval_notifier infrastructure allows deferring the wait to a
> > > second pass, so all GPUs can be signalled in the first pass before
> > > any of them are waited on.
> > > 
> > > Convert the userptr invalidation to use the two-pass model:
> > > 
> > > Use invalidate_start as the first pass to mark the VMA for repin
> > > and
> > > enable software signalling on the VM reservation fences to start
> > > any
> > > gpu work needed for signaling. Fall back to completing the work
> > > synchronously if all fences are already signalled, or if a
> > > concurrent
> > > invalidation is already using the embedded finish structure.
> > > 
> > > Use invalidate_finish as the second pass to wait for the
> > > reservation
> > > fences to complete, invalidate the GPU TLB in fault mode, and unmap
> > > the gpusvm pages.
> > > 
> > > Embed a struct mmu_interval_notifier_finish in struct xe_userptr to
> > > avoid dynamic allocation in the notifier callback. Use a
> > > finish_inuse
> > > flag to prevent two concurrent invalidations from using it
> > > simultaneously; fall back to the synchronous path for the second
> > > caller.
> > > 
> > 
> > A couple nits below. Also for clarity, I'd probably rework this
> > series...
> > 
> >  - Move patch #3 to 2nd to patch
> >  - Squash patch #2, #4 into a single patch, make thia the last patch
> > 
> > Let me know what you think on the reordering. I'm looking with the
> > series applied and aside from nits below everything in xe_userptr.c
> > looks good to me.
> 
> We could do that, but unless you insist, I'd like to keep the current
> ordering since patch #2 is a very simple example on how to convert and
> also since #4 makes the notifier rather complex so it'd be good to
> be able to bisect in between those two.
> 

I think it is fine the way you have it - I never have strong opinions of
stuff like this. For me to review, I just had to apply the series to a
branch to get a full picture / verify all flows were correct.

> >  
> > > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > > Signed-off-by: Thomas Hellström <[email protected]>
> > > ---
> > >  drivers/gpu/drm/xe/xe_userptr.c | 96 +++++++++++++++++++++++++----
> > > ----
> > >  drivers/gpu/drm/xe/xe_userptr.h | 14 +++++
> > >  2 files changed, 88 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_userptr.c
> > > b/drivers/gpu/drm/xe/xe_userptr.c
> > > index e120323c43bc..440b0a79d16f 100644
> > > --- a/drivers/gpu/drm/xe/xe_userptr.c
> > > +++ b/drivers/gpu/drm/xe/xe_userptr.c
> > > @@ -73,18 +73,42 @@ int xe_vma_userptr_pin_pages(struct
> > > xe_userptr_vma *uvma)
> > >                               &ctx);
> > >  }
> > >  
> > > -static void __vma_userptr_invalidate(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma)
> > > +static void xe_vma_userptr_do_inval(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma,
> > > +                             bool is_deferred)
> > >  {
> > >   struct xe_userptr *userptr = &uvma->userptr;
> > >   struct xe_vma *vma = &uvma->vma;
> > > - struct dma_resv_iter cursor;
> > > - struct dma_fence *fence;
> > >   struct drm_gpusvm_ctx ctx = {
> > >           .in_notifier = true,
> > >           .read_only = xe_vma_read_only(vma),
> > >   };
> > >   long err;
> > >  
> > 
> > xe_svm_assert_in_notifier(vm);
> 
> This actually reveals a pre-existing bug. Since this code
> is called with the notifier lock held in read mode, and
> the vm resv held in the userptr invalidation injection.
> That assert would hit.
> 
> 
> Also drm_gpusvm_unmap_pages() below will assert the same thing, (also
> affected by the bug)
> but for clarity I agree we might want to have an assert here, but then
> it would need to include the other mode as well, and I'd need to update
> the locking docs for the two-pass state.
> 

Right! I have pointed this out Auld as this popped in my testing
implementing the CPU bind / ULLS series as I enabled
DRM_XE_USERPTR_INVAL_INJECT in my testing. We should fixup
drm_gpusvm_unmap_pages soonish too + enable DRM_XE_USERPTR_INVAL_INJECT
in some CI runs too.

I think some lockdep assert would be good - even if it just the
non-write mode version with maybe a brief explaination the error
injection path can do this in read mode while the notifier path does
this in write mode.

Matt

> > 
> > > + err = dma_resv_wait_timeout(xe_vm_resv(vm),
> > > +                             DMA_RESV_USAGE_BOOKKEEP,
> > > +                             false, MAX_SCHEDULE_TIMEOUT);
> > > + XE_WARN_ON(err <= 0);
> > > +
> > > + if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) {
> > > +         err = xe_vm_invalidate_vma(vma);
> > > +         XE_WARN_ON(err);
> > > + }
> > > +
> > > + if (is_deferred)
> > > +         userptr->finish_inuse = false;
> > > + drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma-
> > > >userptr.pages,
> > > +                        xe_vma_size(vma) >> PAGE_SHIFT,
> > > &ctx);
> > > +}
> > > +
> > > +static struct mmu_interval_notifier_finish *
> > > +xe_vma_userptr_invalidate_pass1(struct xe_vm *vm, struct
> > > xe_userptr_vma *uvma)
> > > +{
> > > + struct xe_userptr *userptr = &uvma->userptr;
> > > + struct xe_vma *vma = &uvma->vma;
> > > + struct dma_resv_iter cursor;
> > > + struct dma_fence *fence;
> > > + bool signaled = true;
> > > +
> > 
> > xe_svm_assert_in_notifier(vm);
> 
> Same here.
> 
> > 
> > >   /*
> > >    * Tell exec and rebind worker they need to repin and
> > > rebind this
> > >    * userptr.
> > > @@ -105,27 +129,32 @@ static void __vma_userptr_invalidate(struct
> > > xe_vm *vm, struct xe_userptr_vma *uv
> > >    */
> > >   dma_resv_iter_begin(&cursor, xe_vm_resv(vm),
> > >                       DMA_RESV_USAGE_BOOKKEEP);
> > > - dma_resv_for_each_fence_unlocked(&cursor, fence)
> > > + dma_resv_for_each_fence_unlocked(&cursor, fence) {
> > >           dma_fence_enable_sw_signaling(fence);
> > > +         if (signaled && !dma_fence_is_signaled(fence))
> > > +                 signaled = false;
> > > + }
> > >   dma_resv_iter_end(&cursor);
> > >  
> > > - err = dma_resv_wait_timeout(xe_vm_resv(vm),
> > > -                             DMA_RESV_USAGE_BOOKKEEP,
> > > -                             false, MAX_SCHEDULE_TIMEOUT);
> > > - XE_WARN_ON(err <= 0);
> > > -
> > > - if (xe_vm_in_fault_mode(vm) && userptr->initial_bind) {
> > > -         err = xe_vm_invalidate_vma(vma);
> > > -         XE_WARN_ON(err);
> > > + /*
> > > +  * Only one caller at a time can use the multi-pass state.
> > > +  * If it's already in use, or all fences are already
> > > signaled,
> > > +  * proceed directly to invalidation without deferring.
> > > +  */
> > > + if (signaled || userptr->finish_inuse) {
> > > +         xe_vma_userptr_do_inval(vm, uvma, false);
> > > +         return NULL;
> > >   }
> > >  
> > > - drm_gpusvm_unmap_pages(&vm->svm.gpusvm, &uvma-
> > > >userptr.pages,
> > > -                        xe_vma_size(vma) >> PAGE_SHIFT,
> > > &ctx);
> > > + userptr->finish_inuse = true;
> > > +
> > > + return &userptr->finish;
> > >  }
> > >  
> > > -static bool vma_userptr_invalidate(struct mmu_interval_notifier
> > > *mni,
> > > -                            const struct mmu_notifier_range
> > > *range,
> > > -                            unsigned long cur_seq)
> > > +static bool xe_vma_userptr_invalidate_start(struct
> > > mmu_interval_notifier *mni,
> > > +                                     const struct
> > > mmu_notifier_range *range,
> > > +                                     unsigned long cur_seq,
> > > +                                     struct
> > > mmu_interval_notifier_finish **p_finish)
> > >  {
> > >   struct xe_userptr_vma *uvma = container_of(mni,
> > > typeof(*uvma), userptr.notifier);
> > >   struct xe_vma *vma = &uvma->vma;
> > > @@ -138,21 +167,40 @@ static bool vma_userptr_invalidate(struct
> > > mmu_interval_notifier *mni,
> > >           return false;
> > >  
> > >   vm_dbg(&xe_vma_vm(vma)->xe->drm,
> > > -        "NOTIFIER: addr=0x%016llx, range=0x%016llx",
> > > +        "NOTIFIER PASS1: addr=0x%016llx, range=0x%016llx",
> > >           xe_vma_start(vma), xe_vma_size(vma));
> > >  
> > >   down_write(&vm->svm.gpusvm.notifier_lock);
> > >   mmu_interval_set_seq(mni, cur_seq);
> > >  
> > > - __vma_userptr_invalidate(vm, uvma);
> > > + *p_finish = xe_vma_userptr_invalidate_pass1(vm, uvma);
> > > +
> > >   up_write(&vm->svm.gpusvm.notifier_lock);
> > > - trace_xe_vma_userptr_invalidate_complete(vma);
> > > + if (!*p_finish)
> > > +         trace_xe_vma_userptr_invalidate_complete(vma);
> > >  
> > >   return true;
> > >  }
> > >  
> > > +static void xe_vma_userptr_invalidate_finish(struct
> > > mmu_interval_notifier_finish *finish)
> > > +{
> > > + struct xe_userptr_vma *uvma = container_of(finish,
> > > typeof(*uvma), userptr.finish);
> > > + struct xe_vma *vma = &uvma->vma;
> > > + struct xe_vm *vm = xe_vma_vm(vma);
> > > +
> > > + vm_dbg(&xe_vma_vm(vma)->xe->drm,
> > > +        "NOTIFIER PASS2: addr=0x%016llx, range=0x%016llx",
> > > +         xe_vma_start(vma), xe_vma_size(vma));
> > > +
> > > + down_write(&vm->svm.gpusvm.notifier_lock);
> > > + xe_vma_userptr_do_inval(vm, uvma, true);
> > > + up_write(&vm->svm.gpusvm.notifier_lock);
> > > + trace_xe_vma_userptr_invalidate_complete(vma);
> > > +}
> > > +
> > >  static const struct mmu_interval_notifier_ops
> > > vma_userptr_notifier_ops = {
> > > - .invalidate = vma_userptr_invalidate,
> > > + .invalidate_start = xe_vma_userptr_invalidate_start,
> > > + .invalidate_finish = xe_vma_userptr_invalidate_finish,
> > >  };
> > >  
> > >  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > > @@ -164,6 +212,7 @@ static const struct mmu_interval_notifier_ops
> > > vma_userptr_notifier_ops = {
> > >   */
> > >  void xe_vma_userptr_force_invalidate(struct xe_userptr_vma *uvma)
> > >  {
> > > + static struct mmu_interval_notifier_finish *finish;
> > >   struct xe_vm *vm = xe_vma_vm(&uvma->vma);
> > >  
> > >   /* Protect against concurrent userptr pinning */
> > > @@ -179,7 +228,10 @@ void xe_vma_userptr_force_invalidate(struct
> > > xe_userptr_vma *uvma)
> > >   if (!mmu_interval_read_retry(&uvma->userptr.notifier,
> > >                                uvma-
> > > >userptr.pages.notifier_seq))
> > >           uvma->userptr.pages.notifier_seq -= 2;
> > > - __vma_userptr_invalidate(vm, uvma);
> > > +
> > > + finish = xe_vma_userptr_invalidate_pass1(vm, uvma);
> > > + if (finish)
> > > +         xe_vma_userptr_do_inval(vm, uvma, true);
> > >  }
> > >  #endif
> > >  
> > > diff --git a/drivers/gpu/drm/xe/xe_userptr.h
> > > b/drivers/gpu/drm/xe/xe_userptr.h
> > > index ef801234991e..4f42db61fd62 100644
> > > --- a/drivers/gpu/drm/xe/xe_userptr.h
> > > +++ b/drivers/gpu/drm/xe/xe_userptr.h
> > > @@ -57,12 +57,26 @@ struct xe_userptr {
> > >    */
> > >   struct mmu_interval_notifier notifier;
> > >  
> > > + /**
> > > +  * @finish: MMU notifier finish structure for two-pass
> > > invalidation.
> > > +  * Embedded here to avoid allocation in the notifier
> > > callback.
> > > +  * Protected by @vm::svm.gpusvm.notifier_lock.
> > > +  */
> > > + struct mmu_interval_notifier_finish finish;
> > > + /**
> > > +  * @finish_inuse: Whether @finish is currently in use by
> > > an in-progress
> > > +  * two-pass invalidation.
> > > +  * Protected by @vm::svm.gpusvm.notifier_lock.
> > > +  */
> > > + bool finish_inuse;
> > > +
> > >   /**
> > >    * @initial_bind: user pointer has been bound at least
> > > once.
> > >    * write: vm->svm.gpusvm.notifier_lock in read mode and
> > > vm->resv held.
> > >    * read: vm->svm.gpusvm.notifier_lock in write mode or vm-
> > > >resv held.
> > >    */
> > >   bool initial_bind;
> > > +
> > 
> > Unrelated.
> 
> Sure. Will fix.
> 
> Thanks,
> Thomas
> 
> 
> 
> > 
> > Matt
> > 
> > >  #if IS_ENABLED(CONFIG_DRM_XE_USERPTR_INVAL_INJECT)
> > >   u32 divisor;
> > >  #endif
> > > -- 
> > > 2.53.0
> > > 

Reply via email to