On Mon, Aug 18, 2025 at 01:07:26PM -0300, Jason Gunthorpe wrote: > On Sat, Aug 09, 2025 at 03:51:32PM +0200, Thomas Hellström wrote: > > GPU use-cases for mmu_interval_notifiers with hmm often involve > > starting a gpu operation and then waiting for it to complete. > > These operations are typically context preemption or TLB flushing. > > > > With single-pass notifiers per GPU this doesn't scale in > > multi-gpu scenarios. In those scenarios we'd want to first start > > preemption- or TLB flushing on all GPUs and as a second pass wait > > for them to complete on all gpus. > > The idea seems reasonable but I'm not sure I like the naming of > 'multipass' or necessarily the complexity. > > This is sort of a co-operative multithreading thing. > > Do you really need a linked list here? At least justify the design > choices in the commit message.. >
I think this choice makes sense: it allows embedding the wait state from the initial notifier call into the pass structure. Patch [6] shows this by attaching the issued TLB invalidation fences to the pass. Since a single notifier may be invoked multiple times with different ranges but the same seqno, I think this is the correct design choice—otherwise there’s no unambiguous way to track per-invocation wait state. I agree this should be documented in both the commit message and kernel-doc. Matt [6] https://patchwork.freedesktop.org/patch/667844/?series=152725&rev=1 > > +struct mmu_interval_notifier_pass { > > + struct list_head link; > > + /** > > + * @pass: Driver callback for additionall pass. > > + * @additional_pass: Pointer to the mmu_interval_notifier_pass > > structure. > > + * @range: The mmu_notifier_range. > > + * @cur_seq: The current sequence set by the first pass. > > + * > > + * Return: Either a pointer to a valid mmu_interval_notifier_pass for > > + * another pass to be called, or %NULL if processing is complete for > > this > > + * notifier. There is no error reporting mechanism for additional > > passes. > > + */ > > + struct mmu_interval_notifier_pass * > > + (*pass) (struct mmu_interval_notifier_pass *additional_pass, > > + const struct mmu_notifier_range *range, > > + unsigned long cur_seq); > > +}; > > + > > /** > > * struct mmu_interval_notifier_ops > > * @invalidate: Upon return the caller must stop using any SPTEs within > > this > > @@ -243,6 +269,10 @@ struct mmu_interval_notifier_ops { > > bool (*invalidate)(struct mmu_interval_notifier *interval_sub, > > const struct mmu_notifier_range *range, > > unsigned long cur_seq); > > + bool (*invalidate_multipass)(struct mmu_interval_notifier *interval_sub, > > + const struct mmu_notifier_range *range, > > + unsigned long cur_seq, > > + struct mmu_interval_notifier_pass **pass); > > Couldn't this just have a pass number counter and some return code to > indicate this notifier is done? > > Or do you really need more than 2 passes? Start/finish make sense > too. Otherwise you may have issues overlapping the backgroundable > operations between different driver types? > > > +static void mn_itree_additional_passes(struct list_head *additional_passes, > > + const struct mmu_notifier_range *range, > > + unsigned long cur_seq) > > +{ > > + struct mmu_interval_notifier_pass *p, *next; > > + > > + while (!list_empty(additional_passes)) { > > + list_for_each_entry_safe(p, next, additional_passes, link) { > > + list_del_init(&p->link); > > + p = p->pass(p, range, cur_seq); > > + if (p) > > + list_add_tail(&p->link, additional_passes); > > + } > > + } > > Like this is very naive, if one driver has only 'prepare' and 'wait > for device ack' passes, then it will immediately stop being concurrent > while another device may be still working on its 3rd pass. > > So either this should be more complicated to properly support > different numbers of passes per registration or we should just support > two passes and be done with it? > > Jason
