On Tue, Jul 9, 2013 at 12:27 PM, Alex Williamson
<[email protected]> wrote:
> On Mon, 2013-07-08 at 15:51 -0600, Bjorn Helgaas wrote:
>> On Mon, Jul 08, 2013 at 02:49:16PM -0600, Alex Williamson wrote:
>> > On Mon, 2013-07-08 at 13:34 -0600, Bjorn Helgaas wrote:
>> > > On Mon, Jul 08, 2013 at 11:07:20AM -0600, Alex Williamson wrote:
>> > > > Joerg,
>> > > >
>> > > > Where do we stand on this series? You had a concern that the heuristic
>> > > > used in patch 1/ could be dangerous. The suggestion for detecting the
>> > > > issue was actually from Bjorn who replied with his rationale. Do you
>> > > > want to go in the direction of a fixed whitelist or do you agree that
>> > > > even if the heuristic breaks it provides better behavior than what we
>> > > > have now? Thanks,
>> > >
>> > > I'm trying to take a step back and look at the overall design, not
>> > > these specific patches.
>> > >
>> > > IOMMUs translate addresses based on their source, i.e., a PCIe
>> > > requester ID. This is made more complicated by the fact that some
>> > > bridges "take ownership" (change the requester ID as they forward
>> > > transactions upstream), as well as the fact that conventional PCI has
>> > > no requester ID at all. And some broken devices apparently generate
>> > > DMA requests using the requester ID of another device.
>> > >
>> > > We currently deal with this using the pci_find_upstream_pcie_bridge()
>> > > and pci_get_dma_source() interfaces, but I think there's too much
>> > > assembly required by their users. pci_find_upstream_pcie_bridge()
>> > > callers normally loop through all the bridges between the "upstream
>> > > PCIe bridge" and the device, checking for bridges that might take
>> > > ownership. They probably also ought to use pci_get_dma_source() to
>> > > account for the broken devices, but most callers don't.
>> > >
>> > > Most of this is PCI-specific stuff that should be of interest to all
>> > > IOMMU drivers, and the overall structure of calls and looping should
>> > > be the same for all of them, so it would be nice to factor it out
>> > > somehow.
>> > >
>> > > The attached patch is guaranteed not to even compile; it's just to
>> > > make this idea more concrete. The basic idea is that since the IOMMU
>> > > driver wants to perform some action for each possible requester ID the
>> > > IOMMU might see, PCI could provide an iterator
>> > > ("pci_for_each_requester_id()") to do that.
>> > >
>> > > Bjorn
>> > >
>> > >
>> > > commit afad51492c6672b96c2b0735600d5695e30f7180
>> > > Author: Bjorn Helgaas <[email protected]>
>> > > Date: Wed Jul 3 16:04:26 2013 -0600
>> > >
>> > > pci-add-for-each-requester-id
>> > >
>> > > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
>> > > index d0627fa..380eb03 100644
>> > > --- a/drivers/pci/search.c
>> > > +++ b/drivers/pci/search.c
>> > > @@ -17,6 +17,89 @@
>> > > DECLARE_RWSEM(pci_bus_sem);
>> > > EXPORT_SYMBOL_GPL(pci_bus_sem);
>> > >
>> > > +#define PCI_REQUESTER_ID(dev) (((dev)->bus->number << 8) |
>> > > (dev)->devfn)
>> > > +#define PCI_BRIDGE_REQUESTER_ID(bridge) ((bridge)->subordinate->number
>> > > << 8)
>> > > +
>> > > +static inline bool pci_is_pcix(struct pci_dev *dev)
>> > > +{
>> > > + return !!pci_pcix_cap(dev); /* XXX not implemented */
>> > > +}
>> > > +
>> > > +static bool pci_bridge_may_take_ownership(struct pci_dev *bridge)
>> > > +{
>> > > + /*
>> > > + * A PCIe to PCI/PCI-X bridge may take ownership per PCIe Bridge
>> > > + * Spec v1.0, sec 2.3.
>> > > + */
>> > > + if (pci_is_pcie(bridge) &&
>> > > + pci_pcie_type(bridge) == PCI_EXP_TYPE_PCI_BRIDGE)
>> > > + return true;
>> >
>> > Assume we still need a quirk here, PCIe-to-PCI bridges without a PCIe
>> > capability are exactly what causes the current bug.
>>
>> Likely so. We might be able to identify that as "secondary bus is not
>> PCIe" or something.
>>
>> > > +
>> > > + /*
>> > > + * A PCI-X to PCI-X bridge need not take ownership because there
>> > > + * are requester IDs on the secondary PCI-X bus. However, if a PCI
>> > > + * device is added on the secondary bus, the bridge must revert to
>> > > + * being a PCI-X to PCI bridge, and it then *would* take ownership.
>> > > + * Assuming a PCI-X to PCI-X bridge takes ownership means we can
>> > > + * tolerate a future hot-add without having to change existing
>> > > + * IOMMU mappings.
>> > > + */
>> > > + if (pci_is_pcix(bridge))
>> > > + return true;
>> > > +
>> > > + return false;
>> > > +}
>> > > +
>> > > +static struct pci_dev *pci_bridge_to_dev(struct pci_bus *bus,
>> > > + struct pci_dev *dev)
>> > > +{
>> > > + struct pci_dev *bridge;
>> > > + struct pci_bus *child_bus;
>> > > + u8 secondary, subordinate, busn = dev->bus->number;
>> > > +
>> > > + if (dev->bus == bus)
>> > > + return dev;
>> > > +
>> > > + /*
>> > > + * There may be several devices on "bus". Find the one that is a
>> > > + * bridge leading to "dev".
>> > > + */
>> > > + list_for_each_entry(bridge, &bus->devices, bus_list) {
>> > > + child_bus = bridge->subordinate;
>> > > + if (child_bus) {
>> > > + secondary = child_bus->busn_res.start;
>> > > + subordinate = child_bus->busn_res.end;
>> > > + if (secondary <= busn && busn <= subordinate)
>> > > + return bridge;
>> > > + }
>> > > + }
>> > > + return NULL;
>> > > +}
>> > > +
>> > > +int pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev
>> > > *dev,
>> > > + int (*fn)(struct pci_dev *, void *),
>> >
>> > u16 requester_id
>> >
>> > > + void *data)
>> > > +{
>> > > + int ret;
>> > > +
>> > > + dev = pci_get_dma_source(dev); /* XXX ref count screwup */
>> >
>> > It's unfortunate that our dma quirk still seems to only get called for
>> > the end device where we could have bridges that need it too.
>>
>> Sure, we can do that if necessary.
>>
>> I'm not trying to present this as any kind of finished solution. The
>> point is to figure out if we can make an interface that's a little
>> more abstract so we can pull the PCI topology issues out of the IOMMU
>> drivers, and I just thought something that looked like code would be
>> more concrete and easier to discuss.
>>
>> > > +
>> > > + while (bridge != dev) {
>> >
>> >
>> > I'm having a hard time understanding this function since bridge is never
>> > defined in any of the below sample code. Would bridge be the root port
>> > above a device? And what about devices connected to the RC? Seems like
>> > we'd need to traverse up the bus first, so there's a bit of missing
>> > magic.
>>
>> I was assuming that an IOMMU could be either at the root of a PCI
>> hierarchy or buried farther down. If that's true, the IOMMU driver
>> might need to supply the top of the PCI hierarchy for which it
>> translates. That's what I'm trying to get at with the "bridge"
>> argument.
>>
>> If the IOMMU is buried, I assumed there was no point in iterating over
>> bridges upstream of the IOMMU because it wouldn't see requester IDs of
>> those upstream bridges. PCI wouldn't have any way to figure out how
>> far up to go, hence the argument.
>
> I don't think that getting the bridge is going to be quite so simple.
> VT-d can have multiple IOMMUs, each handling a specific device or
> hierarchy. One IOMMU per domain can be default IOMMU for any device not
> otherwise specified. The IOMMU is not a PCI enumerable device. AMD-Vi
> does have a PCI visible IOMMU, but it's an endpoint on the root complex,
> not the top of a hierarchy.
If the only possible topology is that the IOMMU sees transactions
coming out of a device on a root bus, then a bridge parameter is
unnecessary, and we can just look at every bridge between the root bus
and the device. I just didn't know whether that was the only possible
topology.
If an IOMMU can be placed such that it's like a PCIe switch that
happens to translate DMA addresses before forwarding upstream, then we
might want a bridge parameter that identifies the "downstream port" of
the IOMMU or the upstream port of a switch below the IOMMU. Otherwise
we would generate requester IDs unnecessarily for bridges above the
IOMMU.
> I'm warming up to your approach, but I think I'd rather see something
> like:
>
> struct pci_dev *pci_get_requester_id(struct pci_dev *pdev)
This returns a "pci_dev *", not a requester ID, so for bridges that
can take ownership, this depends on there being a pci_dev on the
secondary bus with dev/fn == 0. Maybe that's always true; I'm not
sure.
> {
> do {
> /* Apply DMA quirks at every step */
> pdev = pci_get_dma_source(pdev);
>
> /* Done for PCIe devices or broken bridges, PCI-X TBD */
> if (pci_is_pcie(pdev) || /* broken pcie bridge test */)
> break;
>
> } while (!pci_is_root_bus(pdev->bus) && (pdev = pdev->bus->self));
>
> return pdev;
I'm not sure it's sufficient to stop when we find a PCIe bridge. For
example, in this topology:
00:1c.0 PCIe root port
01:00.0 PCIe-to-PCI bridge
02:00.0 PCI-to-PCIe reverse bridge
03:00.0 PCIe switch upstream port
04:00.0 PCIe switch downstream port
05:00.0 PCIe endpoint
the IOMMU will only see requester IDs of 02:00.0, even though if we
search up from 05:00.0 we find a PCIe device at 04:00.0. Maybe this
is too contrived, but reverse bridges are sometimes used on add-in
cards so a new part can be used in older systems, e.g., in this case
the add-in card would contain 02:00.0 and everything below it.
> }
>
>
> /* Start with requester ID pdev and pdev->bus, this way we handle both peer
> * and parent requester IDs */
> int pci_for_each_requester_id(struct pci_dev *pdev, struct pci_bus *bus,
> int (*fn)(struct pci_dev *, void *),
> void *data)
> {
> struct pci_dev *tmp;
> int ret;
>
> list_for_each_entry(tmp, &bus->devices, bus_list) {
> if (pci_get_requester_id(tmp) == pdev) {
> ret = fn(tmp, data);
What does the IOMMU driver supply for "bus"? Presumably it doesn't
supply "pdev->bus" because you're traversing downward
("tmp->subordinate" below).
> if (ret)
> ...
>
> if (tmp->subordinate) {
> ret = pci_for_each_requester_id(pdev,
> tmp->subordinate, fn);
> if (ret)
> ...
> }
> }
> }
>
> return ret;
> }
>
> Seems like that would be sufficient to leverage the intel-iommu changes
> you've drafted. Thanks,
>
> Alex
>
>> > > + if (pci_bridge_may_take_ownership(bridge)) {
>> > > + ret = fn(dev, PCI_BRIDGE_REQUESTER_ID(bridge), data);
>> > > + if (ret)
>> > > + return ret;
>> > > + }
>> > > + bridge = pci_bridge_to_dev(bridge->subordinate, dev);
>> > > + }
>> > > +
>> > > + if (pci_is_pcie(dev) || pci_is_pcix(dev))
>> > > + return fn(dev, PCI_REQUESTER_ID(dev), data);
>> > > +
>> > > + /* Conventional PCI has no requester ID */
>> > > + return 0;
>> > > +}
>> > > +
>> > > /*
>> > > * find the upstream PCIe-to-PCI bridge of a PCI device
>> > > * if the device is PCIE, return NULL
>> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
>> > > index 3a24e4f..cbcc82f 100644
>> > > --- a/include/linux/pci.h
>> > > +++ b/include/linux/pci.h
>> > > @@ -794,6 +794,8 @@ static inline struct pci_dev
>> > > *pci_get_bus_and_slot(unsigned int bus,
>> > > }
>> > > struct pci_dev *pci_get_class(unsigned int class, struct pci_dev *from);
>> > > int pci_dev_present(const struct pci_device_id *ids);
>> > > +void pci_for_each_requester_id(struct pci_dev *bridge, struct pci_dev
>> > > *dev,
>> > > + int (*fn)(struct pci_dev *, void *), void
>> > > *data);
>> > >
>> > > int pci_bus_read_config_byte(struct pci_bus *bus, unsigned int devfn,
>> > > int where, u8 *val);
>> > >
>> > > commit 6e770d7bf4ef95207df4ac4c42ef4406ff7bc54e
>> > > Author: Bjorn Helgaas <[email protected]>
>> > > Date: Wed Jul 3 12:02:15 2013 -0600
>> > >
>> > > intel-iommu-upstream-pcie
>> >
>> > I assume this one would is also going to apply dma quirks to dma_ops,
>> > which I'm all for.
>> >
>> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> > > index b91564d..bdb6e6a 100644
>> > > --- a/drivers/iommu/intel-iommu.c
>> > > +++ b/drivers/iommu/intel-iommu.c
>> > > @@ -1669,79 +1669,68 @@ static int domain_context_mapping_one(struct
>> > > dmar_domain *domain, int segment,
>> > > return 0;
>> > > }
>> > >
>> > > +struct domain_context_info {
>> > > + struct dmar_domain *domain;
>> > > + struct pci_dev *dev;
>> > > + int translation;
>> > > + struct intel_iommu *iomumu;
>> >
>> > s/iomumu/iommu/
>> >
>> > > + bool mapped;
>> > > +};
>> > > +
>> > > +static int context_mapping(struct pci_dev *dev, u16 requester_id, void
>> > > *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + int segment = pci_domain_nr(data->dev);
>> >
>> > dev_info->dev
>> >
>> > > + u8 bus = data->requester_id >> 8;
>> >
>> > s/data->//
>> >
>> > > + u8 devfn = data->requester_id & 0xff;
>> >
>> > s/data->//
>>
>> I told you it wouldn't even compile :) I'm only trying to present a
>> possibility, not all the nitty-gritty details.
>>
>> > > +
>> > > + return domain_context_mapping_one(data->domain, segment,
>> > > + bus, devfn, data->translation);
>> > > +}
>> > > +
>> > > static int domain_context_mapping(struct dmar_domain *domain,
>> > > struct pci_dev *pdev, int translation)
>> > > {
>> > > - int ret;
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > > + struct pci_dev *bridge;
>> > >
>> > > - ret = domain_context_mapping_one(domain, pci_domain_nr(pdev->bus),
>> > > - pdev->bus->number, pdev->devfn,
>> > > - translation);
>> > > - if (ret)
>> > > - return ret;
>> > > + data.domain = domain;
>> > > + data.dev = pdev;
>> > > + data.translation = translation;
>> > > + bridge = xx;
>> >
>> > Darn, I was hoping you were going to reveal where bridge comes from.
>> >
>> > > + return pci_for_each_requester_id(bridge, pdev, context_mapping, &data);
>> > > +}
>> > >
>> > > - /* dependent device mapping */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (!tmp)
>> > > - return 0;
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - parent = pdev->bus->self;
>> > > - while (parent != tmp) {
>> > > - ret = domain_context_mapping_one(domain,
>> > > - pci_domain_nr(parent->bus),
>> > > - parent->bus->number,
>> > > - parent->devfn, translation);
>> > > - if (ret)
>> > > - return ret;
>> > > - parent = parent->bus->self;
>> > > - }
>> > > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
>> > > - return domain_context_mapping_one(domain,
>> > > - pci_domain_nr(tmp->subordinate),
>> > > - tmp->subordinate->number, 0,
>> > > - translation);
>> > > - else /* this is a legacy PCI bridge */
>> > > - return domain_context_mapping_one(domain,
>> > > - pci_domain_nr(tmp->bus),
>> > > - tmp->bus->number,
>> > > - tmp->devfn,
>> > > - translation);
>> > > +static int is_context_mapped(struct pci_dev *dev, u16 requester_id,
>> > > void *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > + bool ret;
>> > > +
>> > > + ret = device_context_mapped(dev_info->iommu, bus, devfn);
>> > > + if (!ret)
>> > > + dev_info->mapped = false;
>> > > +
>> > > + return 0;
>> > > }
>> > >
>> > > static bool domain_context_mapped(struct pci_dev *pdev)
>> > > {
>> > > - bool ret;
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > > struct intel_iommu *iommu;
>> > > + struct pci_dev *bridge;
>> > >
>> > > iommu = device_to_iommu(pci_domain_nr(pdev->bus), pdev->bus->number,
>> > > pdev->devfn);
>> > > if (!iommu)
>> > > return false;
>> > >
>> > > - ret = device_context_mapped(iommu, pdev->bus->number, pdev->devfn);
>> > > - if (!ret)
>> > > - return false;
>> > > - /* dependent device mapping */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (!tmp)
>> > > - return true;
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - parent = pdev->bus->self;
>> > > - while (parent != tmp) {
>> > > - ret = device_context_mapped(iommu, parent->bus->number,
>> > > - parent->devfn);
>> > > - if (!ret)
>> > > - return false;
>> > > - parent = parent->bus->self;
>> > > - }
>> > > - if (pci_is_pcie(tmp))
>> > > - return device_context_mapped(iommu, tmp->subordinate->number,
>> > > - 0);
>> > > - else
>> > > - return device_context_mapped(iommu, tmp->bus->number,
>> > > - tmp->devfn);
>> > > + data.iommu = iommu;
>> > > + data.mapped = true;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, is_context_mapped, &data);
>> > > + return data.mapped;
>> > > }
>> > >
>> > > /* Returns a number of VTD pages, but aligned to MM page size */
>> > > @@ -1961,6 +1950,27 @@ static struct dmar_domain *find_domain(struct
>> > > pci_dev *pdev)
>> > > return NULL;
>> > > }
>> > >
>> > > +static int find_domain2(struct pci_dev *dev, u16 requester_id, void
>> > > *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + int segment = pci_domain_nr(data->dev);
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > + unsigned long flags;
>> > > + struct device_domain_info *info;
>> > > +
>> > > + spin_lock_irqsave(&device_domain_lock, flags);
>> > > + list_for_each_entry(info, &device_domain_list, global) {
>> > > + if (info->segment == segment &&
>> > > + info->bus == bus && info->devfn == devfn) {
>> > > + data->domain = info->domain;
>> > > + break;
>> > > + }
>> > > + }
>> > > + spin_unlock_irqrestore(&device_domain_lock, flags);
>> > > + return 0;
>> > > +}
>> > > +
>> > > /* domain is initialized */
>> > > static struct dmar_domain *get_domain_for_dev(struct pci_dev *pdev, int
>> > > gaw)
>> > > {
>> > > @@ -1968,11 +1978,11 @@ static struct dmar_domain
>> > > *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>> > > struct intel_iommu *iommu;
>> > > struct dmar_drhd_unit *drhd;
>> > > struct device_domain_info *info, *tmp;
>> > > - struct pci_dev *dev_tmp;
>> > > unsigned long flags;
>> > > int bus = 0, devfn = 0;
>> > > int segment;
>> > > int ret;
>> > > + struct domain_context_info data;
>> > >
>> > > domain = find_domain(pdev);
>> > > if (domain)
>> > > @@ -1980,29 +1990,13 @@ static struct dmar_domain
>> > > *get_domain_for_dev(struct pci_dev *pdev, int gaw)
>> > >
>> > > segment = pci_domain_nr(pdev->bus);
>> > >
>> > > - dev_tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - if (dev_tmp) {
>> > > - if (pci_is_pcie(dev_tmp)) {
>> > > - bus = dev_tmp->subordinate->number;
>> > > - devfn = 0;
>> > > - } else {
>> > > - bus = dev_tmp->bus->number;
>> > > - devfn = dev_tmp->devfn;
>> > > - }
>> > > - spin_lock_irqsave(&device_domain_lock, flags);
>> > > - list_for_each_entry(info, &device_domain_list, global) {
>> > > - if (info->segment == segment &&
>> > > - info->bus == bus && info->devfn == devfn) {
>> > > - found = info->domain;
>> > > - break;
>> > > - }
>> > > - }
>> > > - spin_unlock_irqrestore(&device_domain_lock, flags);
>> > > - /* pcie-pci bridge already has a domain, uses it */
>> > > - if (found) {
>> > > - domain = found;
>> > > - goto found_domain;
>> > > - }
>> > > + data.domain = NULL;
>> > > + data.dev = pdev;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, find_domain2, &data);
>> > > + if (data.domain) {
>> > > + domain = data.domain;
>> > > + goto found_domain;
>> > > }
>> > >
>> > > domain = alloc_domain();
>> > > @@ -3734,31 +3728,29 @@ int __init intel_iommu_init(void)
>> > > return 0;
>> > > }
>> > >
>> > > +static int detach_requester(struct pci_dev *dev, u16 requester_id, void
>> > > *data)
>> > > +{
>> > > + struct domain_context_info *dev_info = data;
>> > > + struct intel_iommu *iommu = data->iommu;
>> > > + u8 bus = data->requester_id >> 8;
>> > > + u8 devfn = data->requester_id & 0xff;
>> > > +
>> > > + iommu_detach_dev(iommu, bus, devfn);
>> > > + return 0;
>> > > +}
>> > > +
>> > > static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
>> > > struct pci_dev *pdev)
>> > > {
>> > > - struct pci_dev *tmp, *parent;
>> > > + struct domain_context_info data;
>> > >
>> > > if (!iommu || !pdev)
>> > > return;
>> > >
>> > > /* dependent device detach */
>> > > - tmp = pci_find_upstream_pcie_bridge(pdev);
>> > > - /* Secondary interface's bus number and devfn 0 */
>> > > - if (tmp) {
>> > > - parent = pdev->bus->self;
>> > > - while (parent != tmp) {
>> > > - iommu_detach_dev(iommu, parent->bus->number,
>> > > - parent->devfn);
>> > > - parent = parent->bus->self;
>> > > - }
>> > > - if (pci_is_pcie(tmp)) /* this is a PCIe-to-PCI bridge */
>> > > - iommu_detach_dev(iommu,
>> > > - tmp->subordinate->number, 0);
>> > > - else /* this is a legacy PCI bridge */
>> > > - iommu_detach_dev(iommu, tmp->bus->number,
>> > > - tmp->devfn);
>> > > - }
>> > > + data.iommu = iommu;
>> > > + bridge = xx;
>> > > + pci_for_each_requester_id(bridge, pdev, detach_requester, &data);
>> > > }
>> > >
>> > > static void domain_remove_one_dev_info(struct dmar_domain *domain,
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> > > the body of a message to [email protected]
>> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>
>
>
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu