On 28/11/2024 03:19, Zhangfei Gao wrote: > Hi, Joao > > On Fri, Jun 23, 2023 at 5:51 AM Joao Martins <[email protected]> > wrote: >> >> Hey, >> >> This series introduces support for vIOMMU with VFIO device migration, >> particurlarly related to how we do the dirty page tracking. >> >> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2) >> provide dma translation services for guests to provide some form of >> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially >> required for big VMs with VFs with more than 255 vcpus. We tackle both >> and remove the migration blocker when vIOMMU is present provided the >> conditions are met. I have both use-cases here in one series, but I am happy >> to tackle them in separate series. >> >> As I found out we don't necessarily need to expose the whole vIOMMU >> functionality in order to just support interrupt remapping. x86 IOMMUs >> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really >> Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8) >> can instantiate a IOMMU just for interrupt remapping without needing to >> be advertised/support DMA translation. AMD IOMMU in theory can provide >> the same, but Linux doesn't quite support the IR-only part there yet, >> only intel-iommu. >> >> The series is organized as following: >> >> Patches 1-5: Today we can't gather vIOMMU details before the guest >> establishes their first DMA mapping via the vIOMMU. So these first four >> patches add a way for vIOMMUs to be asked of their properties at start >> of day. I choose the least churn possible way for now (as opposed to a >> treewide conversion) and allow easy conversion a posteriori. As >> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which >> allows us to fetch PCI backing vIOMMU attributes, without necessarily >> tieing the caller (VFIO or anyone else) to an IOMMU MR like I >> was doing in v3. >> >> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without >> DMA translation allowed. Today the 'dma-translation' attribute is >> x86-iommu only, but the way this series is structured nothing stops from >> other vIOMMUs supporting it too as long as they use >> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes >> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle >> the toggle/report DMA_TRANSLATION attribute. With the patches up to this set, >> we've then tackled item (1) of the second paragraph. >> >> Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete >> IOVA address space, leveraging the logic we use to compose the dirty ranges. >> The blocker is once again relaxed for vIOMMUs that advertise their IOVA >> addressing limits. This tackles item (2). So far I mainly use it with >> intel-iommu, although I have a small set of patches for virtio-iommu per >> Alex's suggestion in v2. >> >> Comments, suggestions welcome. Thanks for the review! >> >> Regards, >> Joao >> >> Changes since v3[8]: >> * Pick up Yi's patches[5][6], and rework the first four patches. >> These are a bit better splitted, and make the new iommu_ops *optional* >> as opposed to a treewide conversion. Rather than returning an IOMMU MR >> and let VFIO operate on it to fetch attributes, we instead let the >> underlying IOMMU driver fetch the desired IOMMU MR and ask for the >> desired IOMMU attribute. Callers only care about PCI Device backing >> vIOMMU attributes regardless of its topology/association. (Peter Xu) >> These patches are a bit better splitted compared to original ones, >> and I've kept all the same authorship and note the changes from >> original where applicable. >> * Because of the rework of the first four patches, switch to >> individual attributes in the VFIOSpace that track dma_translation >> and the max_iova. All are expected to be unused when zero to retain >> the defaults of today in common code. >> * Improve the migration blocker message of the last patch to be >> more obvious that vIOMMU migration blocker is added when no vIOMMU >> address space limits are advertised. (Patch 15) >> * Cast to uintptr_t in IOMMUAttr data in intel-iommu (Philippe). >> * Switch to MAKE_64BIT_MASK() instead of plain left shift (Philippe). >> * Change diffstat of patches with scripts/git.orderfile (Philippe). >> >> Changes since v2[3]: >> * New patches 1-9 to be able to handle vIOMMUs without DMA translation, and >> introduce ways to know various IOMMU model attributes via the IOMMU MR. This >> is partly meant to address a comment in previous versions where we can't >> access the IOMMU MR prior to the DMA mapping happening. Before this series >> vfio giommu_list is only tracking 'mapped GIOVA' and that controlled by the >> guest. As well as better tackling of the IOMMU usage for interrupt-remapping >> only purposes. >> * Dropped Peter Xu ack on patch 9 given that the code changed a bit. >> * Adjust patch 14 to adjust for the VFIO bitmaps no longer being pointers. >> * The patches that existed in v2 of vIOMMU dirty tracking, are mostly >> * untouched, except patch 12 which was greatly simplified. >> >> Changes since v1[4]: >> - Rebased on latest master branch. As part of it, made some changes in >> pre-copy to adjust it to Juan's new patches: >> 1. Added a new patch that passes threshold_size parameter to >> .state_pending_{estimate,exact}() handlers. >> 2. Added a new patch that refactors vfio_save_block(). >> 3. Changed the pre-copy patch to cache and report pending pre-copy >> size in the .state_pending_estimate() handler. >> - Removed unnecessary P2P code. This should be added later on when P2P >> support is added. (Alex) >> - Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap() >> (patch #11). (Alex) >> - Stored vfio_devices_all_device_dirty_tracking()'s value in a local >> variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11). >> - Refactored the viommu device dirty tracking ranges creation code to >> make it clearer (patch #15). >> - Changed overflow check in vfio_iommu_range_is_device_tracked() to >> emphasize that we specifically check for 2^64 wrap around (patch #15). >> - Added R-bs / Acks. >> >> [0] >> https://lore.kernel.org/qemu-devel/[email protected]/ >> [1] >> https://lore.kernel.org/qemu-devel/[email protected]/ >> [2] >> https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection >> [3] >> https://lore.kernel.org/qemu-devel/[email protected]/ >> [4] >> https://lore.kernel.org/qemu-devel/[email protected]/ >> [5] https://lore.kernel.org/all/[email protected]/ >> [6] https://lore.kernel.org/all/[email protected]/ >> [7] https://lore.kernel.org/qemu-devel/ZH9Kr6mrKNqUgcYs@x1n/ >> [8] >> https://lore.kernel.org/qemu-devel/[email protected]/ >> >> Avihai Horon (4): >> memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute >> intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute >> vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() >> vfio/common: Optimize device dirty page tracking with vIOMMU >> >> Joao Martins (7): >> memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute >> intel-iommu: Implement get_attr() method >> vfio/common: Track whether DMA Translation is enabled on the vIOMMU >> vfio/common: Relax vIOMMU detection when DMA translation is off >> vfio/common: Move dirty tracking ranges update to helper >> vfio/common: Support device dirty page tracking with vIOMMU >> vfio/common: Block migration with vIOMMUs without address width limits >> >> Yi Liu (4): >> hw/pci: Add a pci_setup_iommu_ops() helper >> hw/pci: Refactor pci_device_iommu_address_space() >> hw/pci: Introduce pci_device_iommu_get_attr() >> intel-iommu: Switch to pci_setup_iommu_ops() >> > > Would you mind pointing to the github address? > I have some conflicts, and the github will be much helpful.
Yeap, I have a series -- picking up from Cedric's rebase since 9.1 soft freeze -- but testing is still in progress. Give me a couple days and I'll respond here as there's a little more changes on top (now that we have IOMMUFD support) will get for v5.
