mmu_notifier: Allow multiple struct mmu_interval_notifier passes

Matthew Brost Mon, 18 Aug 2025 09:25:52 -0700

On Mon, Aug 18, 2025 at 01:07:26PM -0300, Jason Gunthorpe wrote:
> On Sat, Aug 09, 2025 at 03:51:32PM +0200, Thomas Hellström wrote:
> > GPU use-cases for mmu_interval_notifiers with hmm often involve
> > starting a gpu operation and then waiting for it to complete.
> > These operations are typically context preemption or TLB flushing.
> > 
> > With single-pass notifiers per GPU this doesn't scale in
> > multi-gpu scenarios. In those scenarios we'd want to first start
> > preemption- or TLB flushing on all GPUs and as a second pass wait
> > for them to complete on all gpus.
> 
> The idea seems reasonable but I'm not sure I like the naming of
> 'multipass' or necessarily the complexity.
> 
> This is sort of a co-operative multithreading thing.
> 
> Do you really need a linked list here? At least justify the design
> choices in the commit message..
>


I think this choice makes sense: it allows embedding the wait state from
the initial notifier call into the pass structure. Patch [6] shows this
by attaching the issued TLB invalidation fences to the pass. Since a
single notifier may be invoked multiple times with different ranges but
the same seqno, I think this is the correct design choice—otherwise
there’s no unambiguous way to track per-invocation wait state. I agree
this should be documented in both the commit message and kernel-doc.

Matt

[6] https://patchwork.freedesktop.org/patch/667844/?series=152725&rev=1

> > +struct mmu_interval_notifier_pass {
> > +   struct list_head link;
> > +   /**
> > +    * @pass: Driver callback for additionall pass.
> > +    * @additional_pass: Pointer to the mmu_interval_notifier_pass 
> > structure.
> > +    * @range: The mmu_notifier_range.
> > +    * @cur_seq: The current sequence set by the first pass.
> > +    *
> > +    * Return: Either a pointer to a valid mmu_interval_notifier_pass for
> > +    * another pass to be called, or %NULL if processing is complete for 
> > this
> > +    * notifier. There is no error reporting mechanism for additional 
> > passes.
> > +    */
> > +   struct mmu_interval_notifier_pass *
> > +   (*pass) (struct mmu_interval_notifier_pass *additional_pass,
> > +            const struct mmu_notifier_range *range,
> > +            unsigned long cur_seq);
> > +};
> > +
> >  /**
> >   * struct mmu_interval_notifier_ops
> >   * @invalidate: Upon return the caller must stop using any SPTEs within 
> > this
> > @@ -243,6 +269,10 @@ struct mmu_interval_notifier_ops {
> >     bool (*invalidate)(struct mmu_interval_notifier *interval_sub,
> >                        const struct mmu_notifier_range *range,
> >                        unsigned long cur_seq);
> > +   bool (*invalidate_multipass)(struct mmu_interval_notifier *interval_sub,
> > +                                const struct mmu_notifier_range *range,
> > +                                unsigned long cur_seq,
> > +                                struct mmu_interval_notifier_pass **pass);
> 
> Couldn't this just have a pass number counter and some return code to
> indicate this notifier is done?
> 
> Or do you really need more than 2 passes? Start/finish make sense
> too. Otherwise you may have issues overlapping the backgroundable
> operations between different driver types?
> 
> > +static void mn_itree_additional_passes(struct list_head *additional_passes,
> > +                                  const struct mmu_notifier_range *range,
> > +                                  unsigned long cur_seq)
> > +{
> > +   struct mmu_interval_notifier_pass *p, *next;
> > +
> > +   while (!list_empty(additional_passes)) {
> > +           list_for_each_entry_safe(p, next, additional_passes, link) {
> > +                   list_del_init(&p->link);
> > +                   p = p->pass(p, range, cur_seq);
> > +                   if (p)
> > +                           list_add_tail(&p->link, additional_passes);
> > +           }
> > +   }
> 
> Like this is very naive, if one driver has only 'prepare' and 'wait
> for device ack' passes, then it will immediately stop being concurrent
> while another device may be still working on its 3rd pass.
> 
> So either this should be more complicated to properly support
> different numbers of passes per registration or we should just support
> two passes and be done with it?
> 
> Jason

Re: [RFC PATCH 1/6] mm/mmu_notifier: Allow multiple struct mmu_interval_notifier passes

Reply via email to