On 8/28/25 14:28, Steven Sistare wrote:
On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
If there are multiple containers and unmap-all fails for some container, we
need to remap vaddr for the other containers for which unmap-all succeeded.
When ram discard is enabled, we should only remap populated parts in a
section instead of the whole section.

Export vfio_ram_discard_notify_populate() and use it to do population.

Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")
Signed-off-by: Zhenzhong Duan <[email protected]>
---
btw: I didn't find easy to test this corner case, only code inspecting

Thanks Zhenzhong, this looks correct.

However, I never liked patch
   eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure")

I think it adds too much complexity for a rare case.  In fact, if we
examine all the possible error return codes, I believe they all would
be caused by other qemu application bugs, or kernel bugs:

vfio_dma_do_unmap()
   returns -EBUSY if an mdev exists.  qemu blocks live update blocker
     when mdev is present.  If this occurs, the blocker has a bug.
   returns -EINVAL if the vaddr was already invalidated.  qemu already
     invalidated it, or never remapped the vaddr after a previous live
     update.  Both are qemu bugs.

iopt_unmap_all
   iopt_unmap_iova_range
     -EBUSY - qemu is concurrently performing other dma map or unmap
              operations.  a bug.

     -EDEADLOCK - Something is not responding to unmap requests.

Therefore, I think we should just revert eba1f657cbb1, and assert that
the qemu vfio_dma_unmap_vaddr_all() call succeeds.


vfio_dma_unmap_vaddr_all() is called in the .pre_save handler. This is
pretty deep in the callstack and errors should be handled by the callers.
Quitting QEMU in that case would be brutal and it would confuse the
sysmgmt layers.

Improving error reporting of the .pre_save handler would be a nice
addition though.


Thanks,

C.





Thoughts?

- Steve

  include/hw/vfio/vfio-container-base.h |  3 +++
  include/hw/vfio/vfio-cpr.h            |  2 +-
  hw/vfio/cpr-legacy.c                  | 19 ++++++++++++++-----
  hw/vfio/listener.c                    |  8 ++++----
  4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index bded6e993f..3f0c085143 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -269,6 +269,9 @@ struct VFIOIOMMUClass {
      void (*release)(VFIOContainerBase *bcontainer);
  };
+int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+                                     MemoryRegionSection *section);
+
  VFIORamDiscardListener *vfio_find_ram_discard_listener(
      VFIOContainerBase *bcontainer, MemoryRegionSection *section);
diff --git a/include/hw/vfio/vfio-cpr.h b/include/hw/vfio/vfio-cpr.h
index d37daffbc5..fb32a5f873 100644
--- a/include/hw/vfio/vfio-cpr.h
+++ b/include/hw/vfio/vfio-cpr.h
@@ -67,7 +67,7 @@ bool vfio_cpr_container_match(struct VFIOContainer *container,
  void vfio_cpr_giommu_remap(struct VFIOContainerBase *bcontainer,
                             MemoryRegionSection *section);
-bool vfio_cpr_ram_discard_register_listener(
+bool vfio_cpr_ram_discard_replay_populated(
      struct VFIOContainerBase *bcontainer, MemoryRegionSection *section);
  void vfio_cpr_save_vector_fd(struct VFIOPCIDevice *vdev, const char *name,
diff --git a/hw/vfio/cpr-legacy.c b/hw/vfio/cpr-legacy.c
index 553b203e9b..6909c0a616 100644
--- a/hw/vfio/cpr-legacy.c
+++ b/hw/vfio/cpr-legacy.c
@@ -224,22 +224,31 @@ void vfio_cpr_giommu_remap(VFIOContainerBase *bcontainer,
      memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
  }
+static int vfio_cpr_rdm_remap(MemoryRegionSection *section, void *opaque)
+{
+    RamDiscardListener *rdl = opaque;
+    return vfio_ram_discard_notify_populate(rdl, section);
+}
+
  /*
   * In old QEMU, VFIO_DMA_UNMAP_FLAG_VADDR may fail on some mapping after
   * succeeding for others, so the latter have lost their vaddr.  Call this
- * to restore vaddr for a section with a RamDiscardManager.
+ * to restore vaddr for populated parts in a section with a RamDiscardManager.
   *
- * The ram discard listener already exists.  Call its populate function
+ * The ram discard listener already exists.  Call its replay_populated function
   * directly, which calls vfio_legacy_cpr_dma_map.
   */
-bool vfio_cpr_ram_discard_register_listener(VFIOContainerBase *bcontainer,
-                                            MemoryRegionSection *section)
+bool vfio_cpr_ram_discard_replay_populated(VFIOContainerBase *bcontainer,
+                                           MemoryRegionSection *section)
  {
+    RamDiscardManager *rdm = 
memory_region_get_ram_discard_manager(section->mr);
      VFIORamDiscardListener *vrdl =
          vfio_find_ram_discard_listener(bcontainer, section);
      g_assert(vrdl);
-    return vrdl->listener.notify_populate(&vrdl->listener, section) == 0;
+    return ram_discard_manager_replay_populated(rdm, section,
+                                                vfio_cpr_rdm_remap,
+                                                &vrdl->listener) == 0;
  }
  int vfio_cpr_group_get_device_fd(int d, const char *name)
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index f498e23a93..74837c1122 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -215,8 +215,8 @@ static void 
vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
      }
  }
-static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
-                                            MemoryRegionSection *section)
+int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
+                                     MemoryRegionSection *section)
  {
      VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                  listener);
@@ -572,8 +572,8 @@ void vfio_container_region_add(VFIOContainerBase 
*bcontainer,
      if (memory_region_has_ram_discard_manager(section->mr)) {
          if (!cpr_remap) {
              vfio_ram_discard_register_listener(bcontainer, section);
-        } else if (!vfio_cpr_ram_discard_register_listener(bcontainer,
-                                                           section)) {
+        } else if (!vfio_cpr_ram_discard_replay_populated(bcontainer,
+                                                          section)) {
              goto fail;
          }
          return;



Reply via email to