On Wed, Oct 21, 2020 at 3:51 PM Harshitha Ramamurthy
<harshitha.ramamur...@intel.com> wrote:
>
> In XPS, the transmit queue selected for a packet is saved in the associated
> sock for the packet and is then used to avoid recalculating the queue
> on subsequent sends. The problem is that the corresponding device is not
> also recorded so that when the queue mapping is referenced it may
> correspond to a different device than the sending one, resulting in an
> incorrect queue being used for transmit. Particularly with xps_rxqs, this
> can lead to non-deterministic behaviour as illustrated below.
>
> Consider a case where xps_rxqs is configured and there is a difference
> in number of Tx and Rx queues. Suppose we have 2 devices A and B. Device A
> has 0-7 queues and device B has 0-15 queues. Packets are transmitted from
> Device A but packets are received on B. For packets received on queue 0-7
> of Device B, xps_rxqs will be applied for reply packets to transmit on
> Device A's queues 0-7. However, when packets are received on queues
> 8-15 of Device B, normal XPS is used to reply packets when transmitting
> from Device A. This leads to non-deterministic behaviour. The case where
> there are fewer receive queues is even more insidious. Consider Device
> A, the trasmitting device has queues 0-15 and Device B, the receiver
> has queues 0-7. With xps_rxqs enabled, the packets will be received only
> on queues 0-7 of Device B, but sent only on 0-7 queues of Device A
> thereby causing a load imbalance.

So the issue is limited to xps_rxqs with multiple nics.

When do we need sk_tx_dev_and_queue_mapping (patch 3/3)? It is used in
netdev_pick_tx, but associations are reset on route change and
recomputed if queue_index would exceed the current device queue count.

> This patch set fixes the issue by recording both the device (via
> ifindex) and the queue in the sock mapping. The pair is set and
> retrieved atomically.

I guess this is the reason for the somewhat convoluted cast to u64
logic in patch 1/3. Is the assumption that 64-bit loads and stores are
atomic on all platforms? That is not correct.

Is atomicity even needed? For the purpose of load balancing it isn't.
Just adding a sk->rx_ifindex would be a lot simpler.

sk->sk_napi_id already uniquely identifies the device. Unfortunately,
dev_get_by_napi_id is not cheap (traverses a hashtable bucket). Though
purely for the purpose of load balancing this validation could be
sample based.

The rx ifindex is also already recorded for inet sockets in
rx_dst_ifindex, and the sk_rx_queue_get functions are limited to
those, so could conceivably use that. But it is derived from skb_iif,
which is overwritten with every reentry of __netif_receive_skb_core.

Reply via email to