On Fri, Feb 26, 2021 at 4:23 PM Daniel Borkmann <dan...@iogearbox.net> wrote:
>
> We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
> csum for the UDP header itself is 0. In that case, GRO aggregation does
> not take place on the phys dev, but instead is deferred to the vxlan/geneve
> driver (see trace below).
>
> The reason is essentially that GRO aggregation bails out in udp_gro_receive()
> for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, 
> i40e,
> others) where for non-zero csums 2abb7cdc0dc8 ("udp: Add support for doing
> checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
> and napi context has csum_valid set. This is however not the case for zero
> UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
>
> At the same time 57c67ff4bd92 ("udp: additional GRO support") added matches
> on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
> so it certainly is expected to handle zero csums in udp_gro_receive(). The
> purpose of the check added via 662880f44203 ("net: Allow GRO to use and set
> levels of checksum unnecessary") seems to catch bad csum and stop aggregation
> right away.
>
> One way to fix aggregation in the zero case is to only perform the !csum_valid
> check in udp_gro_receive() if uh->check is infact non-zero.
>
> Before:
>
>   [...]
>   swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100400 len=1500   (1)
>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100200 len=1500
>   swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101100 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101700 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101b00 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100600 len=1500
>   swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100f00 len=1500
>   swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100a00 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100500 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100700 len=1500
>   swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101d00 len=1500   (2)
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101000 len=1500
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101c00 len=1500
>   swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101400 len=1500
>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100e00 len=1500
>   swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497101600 len=1500
>   swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff966497100800 len=774
>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan 
> skbaddr=0xffff966497100400 len=14032 (1)
>   swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan 
> skbaddr=0xffff966497101d00 len=9112  (2)
>   [...]
>
>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> 10.55.10.4 () port 0 AF_INET : demo
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
>
>    87380  16384  16384    20.01    13129.24
>
> After:
>
>   [...]
>   swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff93ab0d479000 len=11286 (1)
>   swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan 
> skbaddr=0xffff93ab0d479000 len=11236 (1)
>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff93ab0d478500 len=2898  (2)
>   swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  
> skbaddr=0xffff93ab0d479f00 len=8490  (3)
>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan 
> skbaddr=0xffff93ab0d478500 len=2848  (2)
>   swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan 
> skbaddr=0xffff93ab0d479f00 len=8440  (3)
>   [...]
>
>   # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
>   MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> 10.55.10.4 () port 0 AF_INET : demo
>   Recv   Send    Send
>   Socket Socket  Message  Elapsed
>   Size   Size    Size     Time     Throughput
>   bytes  bytes   bytes    secs.    10^6bits/sec
>
>    87380  16384  16384    20.01    24576.53
>
> Fixes: 57c67ff4bd92 ("udp: additional GRO support")
> Fixes: 662880f44203 ("net: Allow GRO to use and set levels of checksum 
> unnecessary")
> Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
> Cc: Eric Dumazet <eduma...@google.com>
> Cc: Willem de Bruijn <will...@google.com>
> Cc: John Fastabend <john.fastab...@gmail.com>
> Cc: Jesse Brandeburg <jesse.brandeb...@intel.com>
> Cc: Tom Herbert <t...@herbertland.com>

Makes sense to me.

We cannot do checksum conversion with zero field, but that does not
have to limit coalescing.

CHECKSUM_COMPLETE with a checksum validated by
skb_gro_checksum_validate_zero_check implies csum_valid.

So the test

>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>              !NAPI_GRO_CB(skb)->csum_valid) ||

Basically matches

- CHECKSUM_NONE
- CHECKSUM_UNNECESSARY which has already used up its valid state on a
prior header
- CHECKSUM_COMPLETE with bad checksum.

This change just refines to not drop for in the first two cases on a
zero checksum field.

Making this explicit in case anyone sees holes in the logic. Else,

Acked-by: Willem de Bruijn <will...@google.com>

Reply via email to