On Tue, 6 Sep 2016 21:14:11 +0300
Marcel Apfelbaum <mar...@redhat.com> wrote:

> On 09/06/2016 06:38 PM, Alex Williamson wrote:
> > On Thu,  1 Sep 2016 16:22:07 +0300
> > Marcel Apfelbaum <mar...@redhat.com> wrote:
> >  
> >> Proposes best practices on how to use PCIe/PCI device
> >> in PCIe based machines and explain the reasoning behind them.
> >>
> >> Signed-off-by: Marcel Apfelbaum <mar...@redhat.com>
> >> ---
> >>
> >> Hi,
> >>
> >> Please add your comments on what to add/remove/edit to make this doc 
> >> usable.
> >>
> >> Thanks,
> >> Marcel
> >>
> >>  docs/pcie.txt | 145 
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 145 insertions(+)
> >>  create mode 100644 docs/pcie.txt
> >>
> >> diff --git a/docs/pcie.txt b/docs/pcie.txt
> >> new file mode 100644
> >> index 0000000..52a8830
> >> --- /dev/null
> >> +++ b/docs/pcie.txt
> >> @@ -0,0 +1,145 @@
> >> +PCI EXPRESS GUIDELINES
> >> +======================
> >> +
> >> +1. Introduction
> >> +================
> >> +The doc proposes best practices on how to use PCIe/PCI device
> >> +in PCIe based machines and explains the reasoning behind them.
> >> +
> >> +
> >> +2. Device placement strategy
> >> +============================
> >> +QEMU does not have a clear socket-device matching mechanism
> >> +and allows any PCI/PCIe device to be plugged into any PCI/PCIe slot.
> >> +Plugging a PCI device into a PCIe device might not always work and
> >> +is weird anyway since it cannot be done for "bare metal".
> >> +Plugging a PCIe device into a PCI slot will hide the Extended
> >> +Configuration Space thus is also not recommended.
> >> +
> >> +The recommendation is to separate the PCIe and PCI hierarchies.
> >> +PCIe devices should be plugged only into PCIe Root Ports and
> >> +PCIe Downstream ports (let's call them PCIe ports).
> >> +
> >> +2.1 Root Bus (pcie.0)
> >> +=====================
> >> +Plug only legacy PCI devices as Root Complex Integrated Devices
> >> +even if the PCIe spec does not forbid PCIe devices. The existing  
> >  
> 
> Hi Alex,
> Thanks for the review.
> 
> 
> > Surely we can have PCIe device on the root complex??
> >  
> 
> Yes, we can, is not forbidden. Even so, my understanding is
> the main use for Integrated Devices is for legacy devices
> like sound cards or nics that come with the motherboard.
> Because of that my concern is we might be missing some support
> for that in QEMU or even in linux kernel.
> 
> One example I got from Jason about an issue with Integrated Points in kernel:
> 
> commit d14053b3c714178525f22660e6aaf41263d00056
> Author: David Woodhouse <david.woodho...@intel.com>
> Date:   Thu Oct 15 09:28:06 2015 +0100
> 
>      iommu/vt-d: Fix ATSR handling for Root-Complex integrated endpoints
> 
>      The VT-d specification says that "Software must enable ATS on endpoint
>      devices behind a Root Port only if the Root Port is reported as
>      supporting ATS transactions."
>    ....
> 
> We can say is a bug and is solved, what's the problem?
> But my point it, why do it in the first place?
> We are the hardware "vendors" and we can decide not to add PCIe
> devices as Integrated Devices.
> 
> 
> 
> >> +hardware uses mostly PCI devices as Integrated Endpoints. In this
> >> +way we may avoid some strange Guest OS-es behaviour.
> >> +Other than that plug only PCIe Root Ports, PCIe Switches (upstream ports)
> >> +or DMI-PCI bridges to start legacy PCI hierarchies.
> >> +
> >> +
> >> +   pcie.0 bus
> >> +   
> >> --------------------------------------------------------------------------
> >> +        |                |                    |                   |
> >> +   -----------   ------------------   ------------------  
> >> ------------------
> >> +   | PCI Dev |   | PCIe Root Port |   |  Upstream Port |  | DMI-PCI 
> >> bridge |
> >> +   -----------   ------------------   ------------------  
> >> ------------------  
> >
> > Do you have a spec reference for plugging an upstream port directly
> > into the root complex?  IMHO this is invalid, an upstream port can only
> > be attached behind a downstream port, ie. a root port or downstream
> > switch port.
> >  
> 
> Yes, is a bug, both me and Laszlo spotted it and the 2.2 figure shows it 
> right.
> Thanks for finding it.
> 
> >> +
> >> +2.2 PCIe only hierarchy
> >> +=======================
> >> +Always use PCIe Root ports to start a PCIe hierarchy. Use PCIe switches 
> >> (Upstream
> >> +Ports + several Downstream Ports) if out of PCIe Root Ports slots. PCIe 
> >> switches
> >> +can be nested until a depth of 6-7. Plug only PCIe devices into PCIe 
> >> Ports.  
> >
> > This seems to contradict 2.1,  
> 
> Yes, please forgive the bug, it will not appear in v2
> 
>   but I agree more with this statement to
> > only start a PCIe sub-hierarchy with a root port, not an upstream port
> > connected to the root complex.  The 2nd sentence is confusing, I don't
> > know if you're referring to fan-out via PCIe switch downstream of a
> > root port or again suggesting to use upstream switch ports directly on
> > the root complex.
> >  
> 
> The PCIe hierarchy always starts with PCI Express Root Ports, the switch
> is to be plugged in the PCi Express ports. I will try to re-phrase to be more
> clear.
> 
> 
> >> +
> >> +
> >> +   pcie.0 bus
> >> +   ----------------------------------------------------
> >> +        |                |               |
> >> +   -------------   -------------   -------------
> >> +   | Root Port |   | Root Port |   | Root Port |
> >> +   ------------   --------------   -------------
> >> +         |                               |
> >> +    ------------                 -----------------
> >> +    | PCIe Dev |                 | Upstream Port |
> >> +    ------------                 -----------------
> >> +                                  |            |
> >> +                     -------------------    -------------------
> >> +                     | Downstream Port |    | Downstream Port |
> >> +                     -------------------    -------------------
> >> +                             |
> >> +                         ------------
> >> +                         | PCIe Dev |
> >> +                         ------------
> >> +
> >> +2.3 PCI only hierarchy
> >> +======================
> >> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices or
> >> +into DMI-PCI bridge. PCI-PCI bridges can be plugged into DMI-PCI bridges
> >> +and can be nested until a depth of 6-7. DMI-BRIDGES should be plugged
> >> +only into pcie.0 bus.
> >> +
> >> +   pcie.0 bus
> >> +   ----------------------------------------------
> >> +        |                            |
> >> +   -----------               ------------------
> >> +   | PCI Dev |               | DMI-PCI BRIDGE |
> >> +   ----------                ------------------
> >> +                               |            |
> >> +                        -----------    ------------------
> >> +                        | PCI Dev |    | PCI-PCI Bridge |
> >> +                        -----------    ------------------
> >> +                                         |           |
> >> +                                  -----------     -----------
> >> +                                  | PCI Dev |     | PCI Dev |
> >> +                                  -----------     -----------
> >> +  
> >
> > I really wish we had generic PCIe-to-PCI bridges rather than this DMI
> > bridge thing...
> >  
> 
> Thank you, that's a very good idea and I intend to implement it.
> 
> >> +
> >> +
> >> +3. IO space issues
> >> +===================
> >> +PCIe Ports are seen by Firmware/Guest OS as PCI bridges and  
> >
> > Yeah, I've lost the meaning of Ports here, this statement is true for
> > upstream ports as well.
> >  
> 
> Laslzo asked me to enumerate all the controllers instead of "PCIe",
> I am starting to see why...
> 
> >> +as required by PCI spec will reserve a 4K IO range for each.
> >> +The firmware used by QEMU (SeaBIOS/OVMF) will further optimize
> >> +it by allocation the IO space only if there is at least a device
> >> +with IO BARs plugged into the bridge.
> >> +Behind a PCIe PORT only one device may be plugged, resulting in  
> >
> > Here I think you're trying to specify root/downstream ports, but
> > upstream ports have the same i/o port allocation problems and do not
> > have this one device limitation.
> >  
> 
> I'll be more specific, sure.
> 
> >> +the allocation of a whole 4K range for each device.
> >> +The IO space is limited resulting in ~10 PCIe ports per system
> >> +if devices with IO BARs are plugged into IO ports.
> >> +
> >> +Using the proposed device placing strategy solves this issue
> >> +by using only PCIe devices with PCIe PORTS. The PCIe spec requires
> >> +PCIe devices to work without IO BARs.
> >> +The PCI hierarchy has no such limitations.  
> >
> > Actually it does, but it's mostly not an issue since we have 32 slots
> > available (minus QEMU/libvirt excluding 1 for no good reason)
> > downstream of each bridge.
> >  
> 
> This is what I meant, I'll make it more clear.
> 
> >> +
> >> +
> >> +4. Hot Plug
> >> +============
> >> +The root bus pcie.0 does not support hot-plug, so Integrated Devices,
> >> +DMI-PCI bridges and Root Ports can't be hot-plugged/hot-unplugged.
> >> +
> >> +PCI devices can be hot-plugged into PCI-PCI bridges. (There is a bug
> >> +in QEMU preventing it to work, but it would be solved soon).  
> >
> > Probably want to give some sort of date/commit references to these
> > current state of affairs facts, a reader is not likely to lookup the
> > git commit for this verbiage and extrapolate it to a QEMU version.
> >  
> 
> I'll delete it (as Laslo proposed) since is not a "status" doc.
> It will be taken care of eventually.
> 
> >> +The PCI hotplug is ACPI based and can work side by side with the PCIe
> >> +native hotplug.
> >> +
> >> +PCIe devices can be natively hot-plugged/hot-unplugged into/from
> >> +PCIe Ports (Root Ports/Downstream Ports). Switches are hot-pluggable.  
> >
> > Why?  This seems like a QEMU bug.  Clearly we need the downstream ports
> > in place when the upstream switch is hot-added, but this should be
> > feasible.
> >  
> 
> I don't get understand the question. I do think switches can be hot-plugged,
> but I am not sure if QEMU allows it. If not, this is something we should 
> solve.

Sorry, I read to quickly and inserted a <not> in there, I thought the
statement was that switches are not hot-pluggable.  I think the issue
will be the ordering of hot-adding the downstream switch ports prior to
the upstream switch port since or we're going to need to invent a
switch with hot-pluggable downstream ports.  I expect the guest is only
going to scan for downstream ports once after the upstream port is
discovered.
 
> >> +Keep in mind you always need to have at least one PCIe Port available
> >> +for hotplug, the PCIe Ports themselves are not hot-pluggable.  
> >
> > If a user cares about hotplug...
> >  
> 
> ...he should reserve enough empty PCI Express Root Ports/ PCI Express 
> Downstrewm ports.
> Laszlo had some numbers and ideas on how this user can plan in advance for 
> hotplug,
> maybe we should bring them together :)
> 
> >> +
> >> +
> >> +5. Device assignment
> >> +====================
> >> +Host devices are mostly PCIe and should be plugged only into PCIe ports.
> >> +PCI-PCI bridge slots can be used for legacy PCI host devices.  
> >
> > I don't think we have any evidence to suggest this as a best practice.
> > We have a lot of experience placing PCIe host devices into a
> > conventional PCI topology on 440FX.  We don't have nearly as much
> > experience placing them into downstream PCIe ports.  This seems like
> > how we would like for things to behave to look like real hardware
> > platforms, but it's just navel gazing whether it's actually the right
> > thing to do.  Thanks,
> >  
> 
> I had to look up the "navel gazing"...
> Why I do agree with your statements I prefer a cleaner PCI Express machine
> with as little legacy PCI as possible. I use this document as an opportunity
> to start gaining experience with device assignment into PCI Express Root Ports
> and Downstream Ports and solve the issues long the way.

That's exactly what I mean, there's an ulterior, personal motivation in
this suggestion that's not really backed by facts.  You'd like to make
the recommendation to place PCIe assigned devices into PCIe slots, but
that's not necessarily the configuration with the best track record
right now.  In fact there's really no advantage to a user to do this
unless they have a device that needs PCIe (radeon and tg3
potentially come to mind here).  So while I agree with you from an
ideological standpoint, I don't think that's sufficient to make the
recommendation you're proposing here.  Thanks,

Alex

Reply via email to