On Mon, Jan 23, 2017 at 1:44 AM, David Laight <david.lai...@aculab.com> wrote: > Alexander Duyck >> Sent: 19 January 2017 15:55 > ... >> >> The Relaxed Ordering attribute doesn't get applied across the board. >> >> It ends up being limited to a subset of the transactions if I recall >> >> correctly. In this case it is the Tx descriptor write back, and the >> >> Rx data write back. We don't apply the RO bit to any other >> >> transactions. >> >> >> >> In the case of Tx descriptor there is no harm in allowing it to be >> >> reordered because we only really read the DD bit so we don't care >> >> about the ordering of the write back. In the case of the Rx data the >> >> Rx descriptor essentially acts as a flush since it is sent without the >> >> RO bit set. So all the writes before it must be completed before the >> >> Rx descriptor write back. >> > >> > In which case why not set it unconditionally for all architectures? >> > >> > I'm surprised (I often am) that allowing those re-orderings makes >> > any significant difference. >> > Unfortunately you need a PCIe analyser to see what is really happening >> > and they don't come cheap. >> > >> > What I do vaguely remember is that some hosts don't always implement >> > the 'normal' re-ordering of reads and read completions. >> > Re-ordering of reads allows descriptor reads to overtake transmit >> > traffic which is likely to make a difference. >> >> I think part of the issue, at least in the case of SPARC, is that the >> handling of the memory writes in the PCIe root complex is impacted by >> the RO attribute. On the bus itself it doesn't matter much, but at >> the root complex it can become expensive to have to wait on a partial >> write to complete while there are other writes pending. This is why >> the IOMMU for SPARC now has a WEAK_ORDERING attribute you can add so >> that it can write the data in whatever order it wants in relation to >> other writes in that region. > > I hope the IOMMU only ever reorders writes that have the RO bit set.
I'm assuming it only applies to DMA regions mapped with DMA_ATTR_WEAK_ORDERING. Since drivers have to specify that attribute it likely is only going to apply to DMA regions that could have the RO bit set. > Has anyone tried cache invalidates on the rx buffers? > Might make the writes less expensive. > Or is the issue with NUMA rather than cache. I don't know. This is all very SPARC specific and I haven't done any of the work on it. You might try checking with those responsible for introducing DMA_ATTR_WEAK_ORDERING for the SPARC architecture. - Alex