On Tue, 23 Jun 2020 12:52:29 -0700 Saeed Mahameed wrote: > From: Aya Levin <a...@mellanox.com> > > The concept of Relaxed Ordering in the PCI Express environment allows > switches in the path between the Requester and Completer to reorder some > transactions just received before others that were previously enqueued. > > In ETH driver, there is no question of write integrity since each memory > segment is written only once per cycle. In addition, the driver doesn't > access the memory shared with the hardware until the corresponding CQE > arrives indicating all PCI transactions are done.
Assuming the device sets the RO bits appropriately, right? Otherwise CQE write could theoretically surpass the data write, no? > With relaxed ordering set, traffic on the remote-numa is at the same > level as when on the local numa. Same level of? Achievable bandwidth? > Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa > has 300% improvement in the bandwidth. > With relaxed ordering turned off: BW:10 [GB/s] > With relaxed ordering turned on: BW:40 [GB/s] > > The driver turns relaxed ordering off by default. It exposes 2 boolean > private-flags in ethtool: pci_ro_read and pci_ro_write for user > control. > > $ ethtool --show-priv-flags eth2 > Private flags for eth2: > ... > pci_ro_read : off > pci_ro_write : off > > $ ethtool --set-priv-flags eth2 pci_ro_write on > $ ethtool --set-priv-flags eth2 pci_ro_read on I think Michal will rightly complain that this does not belong in private flags any more. As (/if?) ARM deployments take a foothold in DC this will become a common setting for most NICs.