>-----Original Message----- >From: Liu, Yi L <[email protected]> >Subject: Re: [PATCH v5 7/9] vfio/listener: Construct iotlb entry when unmap >memory address space > >On 2025/11/6 12:20, Zhenzhong Duan wrote: >> If a VFIO device in guest switches from passthrough(PT) domain to block >> domain, the whole memory address space is unmapped, but we passed a >NULL >> iotlb entry to unmap_bitmap, then bitmap query didn't happen and we lost >> dirty pages. > >this is a good catch. :) Have you observed problem in testing or just >identified it with patch iteration?
Patch iteration. > >> By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can >> set dirty bits correctly. >> >> For IOMMU address space, we still send NULL iotlb because VFIO don't >> know the actual mappings in guest. It's vIOMMU's responsibility to send >> actual unmapping notifications, e.g., >vtd_address_space_unmap_in_migration() >> >> Signed-off-by: Zhenzhong Duan <[email protected]> >> Tested-by: Giovannio Cabiddu <[email protected]> >> --- >> hw/vfio/listener.c | 15 ++++++++++++++- >> 1 file changed, 14 insertions(+), 1 deletion(-) >> >> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c >> index 2109101158..3b48f6796c 100644 >> --- a/hw/vfio/listener.c >> +++ b/hw/vfio/listener.c >> @@ -713,14 +713,27 @@ static void >vfio_listener_region_del(MemoryListener *listener, >> >> if (try_unmap) { >> bool unmap_all = false; >> + IOMMUTLBEntry entry = {}, *iotlb = NULL; >> >> if (int128_eq(llsize, int128_2_64())) { >> assert(!iova); >> unmap_all = true; >> llsize = int128_zero(); >> } >> + >> + /* >> + * Fake an IOTLB entry for identity mapping which is needed by >dirty >> + * tracking. In fact, in unmap_bitmap, only translated_addr field >is >> + * used to set dirty bitmap. > >Just say sync dirty is needed per unmap. So you may add a check >in_migration as well. If not in migration, it is no needed to do it. Dirty tracking is not only for migration, but also ditry rate/limit. So a non-null iotlb is always passed for ram MR. That iotlb pointer is used only when vfio_container_dirty_tracking_is_started() return true. I can add a check on global_dirty_tracking if you prefer to add a check. > >> + */ >> + if (!memory_region_is_iommu(section->mr)) { >> + entry.iova = iova; >> + entry.translated_addr = iova; >> + iotlb = &entry; >> + } >> + > >While, I'm still wondering how to deal with iommu MR case. Let's see a >scenario first. When switching from DMA domain to PT, QEMU will switch >to PT. This shall trigger the vfio_listener_region_del() and unregister >the iommu notifier. This means vIOMMU side needs to do unmap prior to >switching AS. If not, the iommu notifier is gone when vIOMMU wants to >unmap with an IOTLBEvent. For virtual intel_iommu, it is calling >vtd_address_space_unmap_in_migration() prior to calling >vtd_switch_address_space(). So I think you need to tweak the intel_iommu >a bit to suit the order requirement. :) VTD doesn't support switching from DMA domain to PT atomically, so switches to block domain in between, see intel_iommu_attach_device() in kernel. So with this sequence is DMA->block->PT domain, we have guaranteed the order you shared? See vtd_pasid_cache_sync_locked(). > >BTW. should the iommu MRs even go to this try_unmap branch? I think for >such MRs, it relies on the vIOMMU to unmap explicitly (hence trigger the >vfio_iommu_map_notify()). Yes, it's unnecessary, but it's hard for VFIO to distinguish if try_unmap is due to domain switch or a real unmap. I think it's harmless because the second try_unmap unmaps nothing. Thanks Zhenzhong > >> ret = vfio_container_dma_unmap(bcontainer, iova, >int128_get64(llsize), >> - NULL, unmap_all); >> + iotlb, unmap_all); >> if (ret) { >> error_report("vfio_container_dma_unmap(%p, >0x%"HWADDR_PRIx", " >> "0x%"HWADDR_PRIx") = %d (%s)", > >Regards, >Yi Liu
