Hi Shameer, >-----Original Message----- >From: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> >Subject: [RFC PATCH v3 00/15] hw/arm/virt: Add support for user-creatable >accelerated SMMUv3 > >Hi All, > >This patch series introduces initial support for a user-creatable, >accelerated SMMUv3 device (-device arm-smmuv3,accel=on) in QEMU. > >This is based on the user-creatable SMMUv3 device series [0]. > >Why this is needed: > >On ARM, to enable vfio-pci pass-through devices in a VM, the host SMMUv3 >must be set up in nested translation mode (Stage 1 + Stage 2), with >Stage 1 (S1) controlled by the guest and Stage 2 (S2) managed by the host. > >This series introduces an optional accel property for the SMMUv3 device, >indicating that the guest will try to leverage host SMMUv3 features for >acceleration. By default, enabling accel configures the host SMMUv3 in >nested mode to support vfio-pci pass-through. > >This new accelerated, user-creatable SMMUv3 device lets you: > > -Set up a VM with multiple SMMUv3s, each tied to a different physical >SMMUv3 > on the host. Typically, you’d have multiple PCIe PXB root complexes in the > VM (one per virtual NUMA node), and each of them can have its own >SMMUv3. > This setup mirrors the host's layout, where each NUMA node has its own > SMMUv3, and helps build VMs that are more aligned with the host's NUMA > topology.
Is it a must to mirror the host layout? Does this mirror include smmuv3.0 which linked to pcie.0? Do we have to create same number of smmuv3 as host smmuv3 for guest? What happen if we don't mirror correctly, e.g., vfio device linked to smmuv3.0 in guest while in host it linked to smmuv3.1? > > -The host–guest SMMUv3 association results in reduced invalidation >broadcasts > and lookups for devices behind different physical SMMUv3s. > > -Simplifies handling of host SMMUv3s with differing feature sets. > > -Lays the groundwork for additional capabilities like vCMDQ support. > >Changes from RFCv2[1] and key points in RFCv3: > > -Unlike RFCv2, there is no arm-smmuv3-accel device now. The accelerated > mode is enabled using -device arm-smmuv3,accel=on. > > -When accel=on is specified, the SMMUv3 will allow only vfio-pci endpoint > devices and any non-endpoint devices like PCI bridges and root ports used > to plug in the vfio-pci. See patch#6 > > -I have tried to keep this RFC simple and basic so we can focus on the > structure of this new accelerated support. That means there is no support > for ATS, PASID, or PRI. Only vfio-pci devices that don’t require these > features will work. > > -Some clarity is still needed on the final approach to handle MSI translation. > Hence, RMR support (which is required for this) is not included yet, but > available in the git branch provided below for testing. > > -At least one vfio-pci device must currently be cold-plugged to a PCIe root > complex associated with arm-smmuv3,accel=on. This is required to: > 1. associate a guest SMMUv3 with a host SMMUv3 > 2. retrieve the host SMMUv3 feature registers for guest export > This still needs discussion, as there were concerns previously about this > approach and it also breaks hotplug/unplug scenarios. See patch#14 > > -This version does not yet support host SMMUv3 fault handling or other >event > notifications. These will be addressed in a future patch series. > >Branch for testing: > >This is based on v8 of the SMMUv3 device series and has dependency on the >Intel >series here [3]. > >https://github.com/hisilicon/qemu/tree/smmuv3-dev-v8-accel-rfcv3 > > >Tested on a HiSilicon platform with multiple SMMUv3s. > >./qemu-system-aarch64 \ > -machine virt,accel=kvm,gic-version=3 \ > -object iommufd,id=iommufd0 \ > -bios QEMU_EFI \ > -cpu host -smp cpus=4 -m size=16G,slots=4,maxmem=256G -nographic \ > -device virtio-blk-device,drive=fs \ > -drive if=none,file=ubuntu.img,id=fs \ > -kernel Image \ > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0,accel=on \ Here accel=on, so only vfio device is allowed on pcie.0? > -device vfio-pci,host=0000:75:00.1,bus=pcie.0,iommufd=iommufd0 \ > -device pxb-pcie,id=pcie.1,bus_nr=2,bus=pcie.0 \ > -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on \ > -device >pcie-root-port,id=pcie1.port1,chassis=2,bus=pcie.1,pref64-reserve=2M,io-res >erve=1K \ > -device >vfio-pci,host=0000:7d:02.1,bus=pcie1.port1,iommufd=iommufd0,id=net1 \ > -append "rdinit=init console=ttyAMA0 root=/dev/vda rw >earlycon=pl011,0x9000000" \ > -device pxb-pcie,id=pcie.2,bus_nr=32,bus=pcie.0 \ > -device arm-smmuv3,primary-bus=pcie.2,id=smmuv3.2 \ > -device pcie-root-port,id=pcie2.port1,chassis=8,bus=pcie.2 \ > -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie2.port1 \ > -fsdev local,id=p9fs,path=p9root,security_model=mapped \ > -net none \ > -nographic > > >Guest output: > >root@ubuntu:/# dmesg |grep smmu > arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0 > arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features >0x00008305) > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq > arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq > arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0 > arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features >0x00008305) > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq > arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq > arm-smmu-v3 arm-smmu-v3.2.auto: option mask 0x0 > arm-smmu-v3 arm-smmu-v3.2.auto: ias 44-bit, oas 44-bit (features >0x00008305) > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 65536 entries for cmdq > arm-smmu-v3 arm-smmu-v3.2.auto: allocated 32768 entries for evtq >root@ubuntu:/# > >root@ubuntu:/# lspci -tv >-+-[0000:20]---00.0-[21]----00.0 Red Hat, Inc Virtio filesystem > +-[0000:02]---00.0-[03]----00.0 Huawei Technologies Co., Ltd. Device >a22e > \-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > +-01.0 Huawei Technologies Co., Ltd. Device a251 > +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > \-03.0 Red Hat, Inc. QEMU PCIe Expander bridge Are these all the devices in this guest config? Will not qemu create some default devices implicitly even if we don't ask them in cmdline? Thanks Zhenzhong