On 04/30/2013 01:28 PM, Konrad Rzeszutek Wilk wrote:
On Sat, Apr 27, 2013 at 12:22:28PM +0800, Andrew Cooks wrote:
On Fri, Apr 26, 2013 at 6:23 AM, Don Dutile<[email protected]> wrote:
On 04/24/2013 10:49 PM, Sethi Varun-B16395 wrote:
-----Original Message-----
From: [email protected] [mailto:iommu-
[email protected]] On Behalf Of Don Dutile
Sent: Thursday, April 25, 2013 1:11 AM
To: Alex Williamson
Cc: Yoder Stuart-B08248; [email protected]
Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
On 04/23/2013 03:47 PM, Alex Williamson wrote:
On Tue, 2013-04-23 at 19:16 +0000, Yoder Stuart-B08248 wrote:
-----Original Message-----
From: Alex Williamson [mailto:[email protected]]
Sent: Tuesday, April 23, 2013 11:56 AM
To: Yoder Stuart-B08248
Cc: Joerg Roedel; [email protected]
Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
On Tue, 2013-04-23 at 16:13 +0000, Yoder Stuart-B08248 wrote:
Joerg/Alex,
We have embedded systems where we use QEMU/KVM and have the
requirement to do device assignment, but have no iommu. So we
would like to get vfio-pci working on systems like this.
We're aware of the obvious limitations-- no protection, DMA'able
memory must be physically contiguous and will have no iova->phy
translation. But there are use cases where all OSes involved are
trusted and customers can
live with those limitations. Virtualization is used
here not to sandbox untrusted code, but to consolidate multiple
OSes.
We would like to get your feedback on the rough idea. There are
two parts-- iommu driver and vfio-pci.
1. iommu driver
First, we still need device groups created because vfio is based on
that, so we envision a 'dummy' iommu driver that implements only
the add/remove device ops. Something like:
static struct iommu_ops fsl_none_ops = {
.add_device = fsl_none_add_device,
.remove_device = fsl_none_remove_device,
};
int fsl_iommu_none_init()
{
int ret = 0;
ret = iommu_init_mempool();
if (ret)
return ret;
bus_set_iommu(&platform_bus_type,&fsl_none_ops);
bus_set_iommu(&pci_bus_type,&fsl_none_ops);
return ret;
}
2. vfio-pci
For vfio-pci, we would ideally like to keep user space mostly
unchanged. User space will have to follow the semantics of mapping
only physically contiguous chunks...and iova will equal phys.
So, we propose to implement a new vfio iommu type, called
VFIO_TYPE_NONE_IOMMU. This implements any needed vfio interfaces,
but there are no calls to the iommu layer...e.g. map_dma() is a
noop.
Would like your feedback.
My first thought is that this really detracts from vfio and iommu
groups being a secure interface, so somehow this needs to be clearly
an insecure mode that requires an opt-in and maybe taints the
kernel. Any notion of unprivileged use needs to be blocked and it
should test CAP_COMPROMISE_KERNEL (or whatever it's called now) at
critical access points. We might even have interfaces exported that
would allow this to be an out-of-tree driver (worth a check).
I would guess that you would probably want to do all the iommu group
setup from the vfio fake-iommu driver. In other words, that driver
both creates the fake groups and provides the dummy iommu backend for
vfio.
That would be a nice way to compartmentalize this as a
vfio-noiommu-special.
So you mean don't implement any of the iommu driver ops at all and
keep everything in the vfio layer?
Would you still have real iommu groups?...i.e.
$ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
../../../../kernel/iommu_groups/26
...and that is created by vfio-noiommu-special?
I'm suggesting (but haven't checked if it's possible), to implement
the iommu driver ops as part of the vfio iommu backend driver. The
primary motivation for this would be to a) keep a fake iommu groups
interface out of the iommu proper (possibly containing it in an
external driver) and b) modularizing it so we don't have fake iommu
groups being created by default. It would have to populate the iommu
groups sysfs interfaces to be compatible with vfio.
Right now when the PCI and platform buses are probed, the iommu
driver add-device callback gets called and that is where the
per-device group gets created. Are you envisioning registering a
callback for the PCI bus to do this in vfio-noiommu-special?
Yes. It's just as easy to walk all the devices rather than doing
callbacks, iirc the group code does this when you register. In fact,
this noiommu interface may not want to add all devices, we may want to
be very selective and only add some.
Right.
Sounds like a no-iommu driver is needed to leave vfio unaffected, and
still leverage/use vfio for qemu's device assignment.
Just not sure how to 'taint' it as 'not secure' if no-iommu driver put in
place.
btw -- qemu has the inherent assumption that pci cfg cycles are trapped,
so assigned devices are 'remapped' from system-B:D.F to virt-
machine's
(virtualized) B:D.F of the assigned device.
Are pci-cfg cycles trapped in freescale qemu model ?
The vfio-pci device would be visible (to a KVM guest) as a PCI device on
the virtual PCI bus (emulated by qemu).
-Varun
Understood, but as Alex stated, the whole purpose of VFIO is to
be able to do _secure_, user-level-driven I/O. Since this would
be 'unsecure', there should be a way to note that during configuration.
Does vfio work with swiotlb and if not, can/should swiotlb be
extended? Or does the time and space overhead make it a moot point?
It does not work with SWIOTLB as it uses the DMA API, not the IOMMU API.
It could be extended to use it. I was toying with this b/c for Xen to
use VFIO I would have to implement an Xen IOMMU driver that would basically
piggyback on the SWIOTLB (as Xen itself does the IOMMU parts and takes
care of all the hard work of securing each guest).
But your requirement would be the same, so it might as well be an generic
driver called SWIOTLB-IOMMU driver.
arch/x86/kernel/pci-nommu.c as a starting point?
If you are up for writting I am up for reviewing/Ack-ing/etc.
The complexity would be to figure out the VFIO group thing and how to assign
PCI B:D:F devices to the SWIOTLB-IOMMU driver. Perhaps the same way as
xen-pciback does (or pcistub). That is by writting the BDF in the "bind"
attribute in SysFS (or via a kernel parameter).
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu