listener: Construct iotlb entry when unmap memory address space

Duan, Zhenzhong Thu, 27 Nov 2025 19:00:11 -0800


>-----Original Message-----
>From: Liu, Yi L <[email protected]>
>Subject: Re: [PATCH v5 7/9] vfio/listener: Construct iotlb entry when unmap
>memory address space
>
>On 2025/11/26 13:45, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Liu, Yi L <[email protected]>
>>> Subject: Re: [PATCH v5 7/9] vfio/listener: Construct iotlb entry when
>unmap
>>> memory address space
>>>
>>> On 2025/11/6 12:20, Zhenzhong Duan wrote:
>>>> If a VFIO device in guest switches from passthrough(PT) domain to block
>>>> domain, the whole memory address space is unmapped, but we passed a
>>> NULL
>>>> iotlb entry to unmap_bitmap, then bitmap query didn't happen and we
>lost
>>>> dirty pages.
>>>
>>> this is a good catch. :) Have you observed problem in testing or just
>>> identified it with patch iteration?
>>
>> Patch iteration.
>>
>>>
>>>> By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can
>>>> set dirty bits correctly.
>>>>
>>>> For IOMMU address space, we still send NULL iotlb because VFIO don't
>>>> know the actual mappings in guest. It's vIOMMU's responsibility to send
>>>> actual unmapping notifications, e.g.,
>>> vtd_address_space_unmap_in_migration()
>>>>
>>>> Signed-off-by: Zhenzhong Duan <[email protected]>
>>>> Tested-by: Giovannio Cabiddu <[email protected]>
>>>> ---
>>>>    hw/vfio/listener.c | 15 ++++++++++++++-
>>>>    1 file changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>>>> index 2109101158..3b48f6796c 100644
>>>> --- a/hw/vfio/listener.c
>>>> +++ b/hw/vfio/listener.c
>>>> @@ -713,14 +713,27 @@ static void
>>> vfio_listener_region_del(MemoryListener *listener,
>>>>
>>>>        if (try_unmap) {
>>>>            bool unmap_all = false;
>>>> +        IOMMUTLBEntry entry = {}, *iotlb = NULL;
>>>>
>>>>            if (int128_eq(llsize, int128_2_64())) {
>>>>                assert(!iova);
>>>>                unmap_all = true;
>>>>                llsize = int128_zero();
>>>>            }
>>>> +
>>>> +        /*
>>>> +         * Fake an IOTLB entry for identity mapping which is needed
>by
>>> dirty
>>>> +         * tracking. In fact, in unmap_bitmap, only translated_addr
>field
>>> is
>>>> +         * used to set dirty bitmap.
>>>
>>> Just say sync dirty is needed per unmap. So you may add a check
>>> in_migration as well. If not in migration, it is no needed to do it.
>>
>> Dirty tracking is not only for migration, but also ditry rate/limit. So a
>non-null iotlb
>> is always passed for ram MR. That iotlb pointer is used only when
>> vfio_container_dirty_tracking_is_started() return true.
>>
>> I can add a check on global_dirty_tracking if you prefer to add a check.
>
>yeah, this would be helpful.
>
>>
>>>
>>>> +         */
>>>> +        if (!memory_region_is_iommu(section->mr)) {
>>>> +            entry.iova = iova;
>>>> +            entry.translated_addr = iova;
>>>> +            iotlb = &entry;
>>>> +        }
>>>> +
>>>
>>> While, I'm still wondering how to deal with iommu MR case. Let's see a
>>> scenario first. When switching from DMA domain to PT, QEMU will switch
>>> to PT. This shall trigger the vfio_listener_region_del() and unregister
>>> the iommu notifier. This means vIOMMU side needs to do unmap prior to
>>> switching AS. If not, the iommu notifier is gone when vIOMMU wants to
>>> unmap with an IOTLBEvent. For virtual intel_iommu, it is calling
>>> vtd_address_space_unmap_in_migration() prior to calling
>>> vtd_switch_address_space(). So I think you need to tweak the intel_iommu
>>> a bit to suit the order requirement. :)
>>
>> VTD doesn't support switching from DMA domain to PT atomically, so
>switches
>> to block domain in between, see intel_iommu_attach_device() in kernel.
>>
>> So with this sequence is DMA->block->PT domain, we have guaranteed the
>order
>> you shared? See vtd_pasid_cache_sync_locked().
>
>I see. So guest helps it. This might be a bit weak since we rely on
>guest behavior. I think you may add a TODO or add comment to note it.


Make sense, will add.

>
>BTW. I think the subject can be refined since the real purpose is to
>make tracking dirty pages in the unmap happen in region_del.
>
>vfio/listener: Add missing dirty tracking in region_del

Will do.

>
>with this and the prior check, this patch looks good to me.
>
>Reviewed-by: Yi Liu <[email protected]>
>
>>>
>>> BTW. should the iommu MRs even go to this try_unmap branch? I think for
>>> such MRs, it relies on the vIOMMU to unmap explicitly (hence trigger the
>>> vfio_iommu_map_notify()).
>>
>> Yes, it's unnecessary, but it's hard for VFIO to distinguish if try_unmap is 
>> due
>to
>> domain switch or a real unmap. I think it's harmless because the second
>try_unmap
>> unmaps nothing.
>
>hmmm. can a unmap path go to region_del()? Not quite get the second
>try_unmap, do you mean when vIOMMU unmaps everything via
>vfio_iommu_map_notify() and then switch AS which triggers the region_del
>and this try_unmap branch?

Yes, vfio_iommu_map_notify() is first try_unmap, unmap in region_del is the 
second try_unmap.

Thanks
Zhenzhong

RE: [PATCH v5 7/9] vfio/listener: Construct iotlb entry when unmap memory address space

Reply via email to