On Wed, Jun 07, 2023 at 03:14:07AM +0000, Duan, Zhenzhong wrote: > > > >-----Original Message----- > >From: Peter Xu <[email protected]> > >Sent: Tuesday, June 6, 2023 11:42 PM > >Subject: Re: [PATCH v2 2/4] intel_iommu: Fix a potential issue in VFIO dirty > >page sync > > > ... > >> >> a/include/exec/memory.h b/include/exec/memory.h index > >> >> c3661b2276c7..eecc3eec6702 100644 > >> >> --- a/include/exec/memory.h > >> >> +++ b/include/exec/memory.h > >> >> @@ -142,6 +142,10 @@ struct IOMMUTLBEntry { > >> >> * events (e.g. VFIO). Both notifications must be accurate so > >> >> that > >> >> * the shadow page table is fully in sync with the guest view. > >> >> * > >> >> + * Besides MAP, there is a special use case called FULL_MAP which > >> >> + * requests notification for all the existent mappings (e.g. VFIO > >> >> + * dirty page sync). > >> > > >> >Why do we need FULL_MAP? Can we simply reimpl MAP? > >> > >> Sorry, I just realized IOMMU_NOTIFIER_FULL_MAP is confusing. > >> Maybe IOMMU_NOTIFIER_MAP_FAST_PATH could be a bit more accurate. > >> > >> IIUC, currently replay() is called from two paths, one is VFIO device > >> address space switch which walks over the IOMMU page table to setup > >> initial mapping and cache it in IOVA tree. The other is VFIO dirty > >> sync which walks over the IOMMU page table to notify the mapping, > >> because we already cache the mapping in IOVA tree and VFIO dirty sync > >> is protected by BQL, so I think it's fine to pick mapping from IOVA > >> tree directly instead of walking over IOMMU page table. That's the > >> reason of FULL_MAP (IOMMU_NOTIFIER_MAP_FAST_PATH better). > >> > >> About "reimpl MAP", do you mean to walk over IOMMU page table to > >> notify all existing MAP events without checking with the IOVA tree for > >> difference? If you prefer, I'll rewrite an implementation this way. > > > >We still need to maintain iova tree. IIUC that's the major complexity of vt-d > >emulation, because we have that extra cache layer to sync with the real guest > >iommu pgtables. > > Can't agree more, looks only intel-iommu and virtio-iommu implemented such > optimization for now. > > > > >But I think we were just wrong to also notify in the unmap_all() procedure. > > > >IIUC the right thing to do (keeping replay() the interface as-is, per it > >used to be > >defined) is we should replace the unmap_all() to only evacuate the iova tree > >(keeping all host mappings untouched, IOW, don't notify UNMAP), and do a > >full resync there, which will notify all existing mappings as MAP. Then we > >don't interrupt with any existing mapping if there is (e.g. for the dirty > >sync > >case), meanwhile we keep sync too to latest (for moving a vfio device into an > >existing iommu group). > > > >Do you think that'll work for us? > > Yes, I think I get your point. > Below simple change will work in your suggested way, do you agree? > > @@ -3825,13 +3833,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion > *iommu_mr, IOMMUNotifier *n) > IntelIOMMUState *s = vtd_as->iommu_state; > uint8_t bus_n = pci_bus_num(vtd_as->bus); > VTDContextEntry ce; > + DMAMap map = { .iova = 0, .size = HWADDR_MAX } > > - /* > - * The replay can be triggered by either a invalidation or a newly > - * created entry. No matter what, we release existing mappings > - * (it means flushing caches for UNMAP-only registers). > - */ > - vtd_address_space_unmap(vtd_as, n); > + /* replay is protected by BQL, page walk will re-setup IOVA tree safely > */ > + iova_tree_remove(as->iova_tree, map); > > if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, &ce) == 0) { > trace_vtd_replay_ce_valid(s->root_scalable ? "scalable mode" :
Yes, thanks! -- Peter Xu
