On Thu, 2016-09-29 at 08:53 -0400, Tom Herbert wrote: > On Thu, Sep 29, 2016 at 12:54 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > > On Wed, 2016-09-28 at 20:54 -0700, Tom Herbert wrote: > >> xps_flows maintains a per device flow table that is indexed by the > >> skbuff hash. The table is only consulted when there is no queue saved in > >> a transmit socket for an skbuff. > >> > >> Each entry in the flow table contains a queue index and a queue > >> pointer. The queue pointer is set when a queue is chosen using a > >> flow table entry. This pointer is set to the head pointer in the > >> transmit queue (which is maintained by BQL). > >> > >> The new function get_xps_flows_index that looks up flows in the > >> xps_flows table. The entry returned gives the last queue a matching flow > >> used. The returned queue is compared against the normal XPS queue. If > >> they are different, then we only switch if the tail pointer in the TX > >> queue has advanced past the pointer saved in the entry. In this > >> way OOO should be avoided when XPS wants to use a different queue. > > > > There is something I do not understand with this. > > > > If this OOO avoidance is tied to BQL, it means that packets sitting in a > > qdisc wont be part of the detection. > > > > So packets of flow X could possibly be queued on multiple qdiscs. > > > Hi Eric, > > I'm not sure I understand your concern. If packets for flow X can be > queued on multiple qdiscs wouldn't that mean that flow will be subject > to ooo transmission regardless of this patch? In other words here > we're trying to keep packets for the flow in order as they are emitted > from the qdiscs which we assume maintains ordering of packets in a > flow.
Well, then what this patch series is solving ? You have a producer of packets running on 8 vcpus in a VM. Packets are exiting the VM and need to be queued on a mq NIC in the hypervisor. Flow X can be scheduled on any of these 8 vcpus, so XPS is currently selecting different TXQ. Your patch aims to detect this and steer packets to one TXQ, but not using a static hash() % num_real_queues (as RSS would have done on a NIC at RX), but trying to please XPS (as : selecting a queue close to the CPU producing the packet. VM with one vcpu, and cpu bounded, would really be happy to not spread packets all over the queues) Your mechanism relies on counters updated at the time packets are given to the NIC, not at the time packets are given to the txq (and eventually sit for a while in the qdisc right before BQL layer) dev_queue_xmit() is not talking to the device directly... All I am saying is that the producer/consumer counters you added are not at the right place. You tried hard to not change the drivers, by adding something to existing BQL. But packets can sit in a qdisc for a while... So you added 2 potential cache lines misses per outgoing packet, but with no guarantee that OOO are really avoided.