The Wednesday 12 Feb 2014 à 12:34:25 (-0700), Alex Williamson wrote : > On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote: > > Hi Alex, > > > > After the IRC conversation we had a few days ago I understood that guest > > IOMMU > > was not implemented. > > > > I have a real use case for it: > > > > Cisco usnic allow to write MPI applications while driving the network card > > in > > userspace in order to optimize the latency. It's made for compute clusters. > > > > The typical cloud provider don't provide bare metal access but only vms on > > top > > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to > > the > > guest and no IOMMU is present in the guest. > > > > questions: Would writing a performing guest IOMMU implementation be > > possible ? > > How complex this project looks for someone knowing IOMMUs issues > > ? > > > > The ideal implementation would forward the IOMMU work to the host hardware > > for > > speed. > > > > I can devote time writing the feature if it's doable. > > Hi Benoît, > > I imagine it's doable, but it's certainly not trivial, beyond that I > haven't put much thought into it.
Thanks for the anwser. I am afraid when an expert of the field says "not trivial" :) Best regards Benoît > > VFIO running in a guest would need an IOMMU that implements both the > IOMMU API and IOMMU groups. Whether that comes from an emulated > physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you > to decide. VT-d would imply using a PCIe chipset like Q35 and trying to > bandage on VT-d or updating Q35 to something that natively supports > VT-d. Getting a sufficiently similar PCIe hierarchy between host an > guest would also be required. > > The current model of putting all guest devices in a single IOMMU domain > on the host is likely not what you would want and might imply a new VFIO > IOMMU backend that is better tuned for separate domains, sparse > mappings, and low-latency. VFIO has a modular IOMMU design, so this > isn't architecturally a problem. The VFIO user (QEMU) is able to select > which backend to use and the code is written with supporting multiple > backends in mind. > > A complication you'll have is that the granularity of IOMMU operations > through VFIO is at the IOMMU group level, so the guest would not be able > to easily split devices grouped together on the host between separate > users in the guest. That could be modeled as a conventional PCI bridge > masking the requester ID of devices in the guest such that host groups > are mirrored as guest groups. > > There might also be more simple "punch-through" ways to do it, for > instance what if instead of trying to make it work like it does on the > host we invented a paravirt VFIO interface and the vfio-pv driver in the > guest populated /dev/vfio as slightly modified passthroughs to the host > fds. The guest OS may not even really need to be aware of the device. > > It's an interesting project and certainly a valid use case. I'd also > like to see things like Intel's DPDK move to using VFIO, but the current > UIO DPDK is often used in guests. Thanks, > > Alex > >