>-----Original Message----- >From: Steven Sistare <[email protected]> >Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section > >On 8/13/2025 11:24 PM, Zhenzhong Duan wrote: >> If there are multiple containers and unmap-all fails for some container, we >> need to remap vaddr for the other containers for which unmap-all >succeeded. >> When ram discard is enabled, we should only remap populated parts in a >> section instead of the whole section. >> >> Export vfio_ram_discard_notify_populate() and use it to do population. >> >> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr >failure") >> Signed-off-by: Zhenzhong Duan <[email protected]> >> --- >> btw: I didn't find easy to test this corner case, only code inspecting > >Thanks Zhenzhong, this looks correct. > >However, I never liked patch > eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr failure") > >I think it adds too much complexity for a rare case. In fact, if we >examine all the possible error return codes, I believe they all would >be caused by other qemu application bugs, or kernel bugs: > >vfio_dma_do_unmap() > returns -EBUSY if an mdev exists. qemu blocks live update blocker > when mdev is present. If this occurs, the blocker has a bug. > returns -EINVAL if the vaddr was already invalidated. qemu already > invalidated it, or never remapped the vaddr after a previous live > update. Both are qemu bugs. > >iopt_unmap_all > iopt_unmap_iova_range > -EBUSY - qemu is concurrently performing other dma map or unmap > operations. a bug. > > -EDEADLOCK - Something is not responding to unmap requests. > >Therefore, I think we should just revert eba1f657cbb1, and assert that >the qemu vfio_dma_unmap_vaddr_all() call succeeds. > >Thoughts?
I agree it's a rare case and your suggestion will make code simple, but I feel it's aggressive to kill QEMU instance if live update fails, try to restore and keep current instance running is important in cloud env and looks more moderate. Thanks Zhenzhong
