On Thu, Aug 25, 2016 at 12:19 PM, Alexander Duyck <alexander.du...@gmail.com> wrote: > On Wed, Aug 24, 2016 at 4:46 PM, Rick Jones <rick.jon...@hpe.com> wrote: >> Also, while it doesn't seem to have the same massive effect on throughput, I >> can also see out of order behaviour happening when the sending VM is on a >> node with a ConnectX-3 Pro NIC. Its driver is also enabling XPS it would >> seem. I'm not *certain* but looking at the traces it appears that with the >> ConnectX-3 Pro there is more interleaving of the out-of-order traffic than >> there is with the Skyhawk. The ConnectX-3 Pro happens to be in a newer >> generation server with a newer processor than the other systems where I've >> seen this. >> >> I do not see the out-of-order behaviour when the NIC at the sending end is a >> BCM57840. It does not appear that the bnx2x driver in the 4.4 kernel is >> enabling XPS. >> >> So, it would seem that there are three cases of enabling XPS resulting in >> out-of-order traffic, two of which result in a non-trivial loss of >> performance. >> >> happy benchmarking, >> >> rick jones > > The problem is that there is no socket associated with the guest from > the host's perspective. This is resulting in the traffic bouncing > between queues because there is no saved socket to lock the interface > onto. > > I was looking into this recently as well and had considered a couple > of options. The first is to fall back to just using skb_tx_hash() > when skb->sk is null for a given buffer. I have a patch I have been > toying around with but I haven't submitted it yet. If you would like > I can submit it as an RFC to get your thoughts. The second option is > to enforce the use of RPS for any interfaces that do not perform Rx in > NAPI context. The correct solution for this is probably some > combination of the two as you have to have all queueing done in order > at every stage of the packet processing. > I have thought several times about creating flow states for packets coming from VMs. This can be done similar to how we do RFS, call flow dissector to get a hash of the flow and then use that to index into a table that contains the last queue-- only change the queue when criteria are meant to prevent OOO. This would result in flow dissector on such packets which seems a bit expensive, it would be nice if the VM can just give us the hash in a TX descriptor. There are other benefits with a more advanced mechanism, for instance we might be able to cache routes or IP tables results (stuff we might keep if there were a transport socket).
Tom