On 2019/1/31 12:33, Tom Herbert wrote: > On Wed, Jan 30, 2019 at 6:59 PM maowenan <maowe...@huawei.com> wrote: >> >> >> >> On 2019/1/31 10:43, Tom Herbert wrote: >>> On Wed, Jan 30, 2019 at 5:58 PM maowenan <maowe...@huawei.com> wrote: >>>> >>>> >>>> >>>> On 2019/1/30 4:24, Tom Herbert wrote: >>>>> On Tue, Jan 29, 2019 at 12:08 AM maowenan <maowe...@huawei.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 2019/1/29 14:24, Tom Herbert wrote: >>>>>>> On Mon, Jan 28, 2019 at 10:04 PM maowenan <maowe...@huawei.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 2019/1/29 12:01, Tom Herbert wrote: >>>>>>>>> On Mon, Jan 28, 2019 at 7:00 PM maowenan <maowe...@huawei.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> Do you have any comments about this change? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2019/1/23 11:33, Mao Wenan wrote: >>>>>>>>>>> When udp4_gro_receive() get one packet that uh->check=0, >>>>>>>>>>> skb_gro_checksum_validate_zero_check() will set the >>>>>>>>>>> skb->ip_summed = CHECKSUM_UNNECESSARY; >>>>>>>>>>> skb->csum_level = 0; >>>>>>>>>>> Then udp_gro_receive() will flush the packet which is not >>>>>>>>>>> CHECKSUM_PARTIAL, >>>>>>>>>>> It is not our expect, because check=0 in udp header indicates this >>>>>>>>>>> packet is no need to caculate checksum, we should go further to do >>>>>>>>>>> GRO. >>>>>>>>>>> >>>>>>>>>>> This patch changes the value of csum_cnt according to >>>>>>>>>>> skb->csum_level. >>>>>>>>>>> --- >>>>>>>>>>> include/linux/netdevice.h | 1 + >>>>>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>>>>> >>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>>>>>> index 1377d08..9c819f1 100644 >>>>>>>>>>> --- a/include/linux/netdevice.h >>>>>>>>>>> +++ b/include/linux/netdevice.h >>>>>>>>>>> @@ -2764,6 +2764,7 @@ static inline void >>>>>>>>>>> skb_gro_incr_csum_unnecessary(struct sk_buff *skb) >>>>>>>>>>> * during GRO. This saves work if we fallback to >>>>>>>>>>> normal path. >>>>>>>>>>> */ >>>>>>>>>>> __skb_incr_checksum_unnecessary(skb); >>>>>>>>>>> + NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1; >>>>>>>>> >>>>>>>>> That doesn't look right. This would be reinitializing the GRO >>>>>>>>> checksums from the beginning. >>>>>>>>> >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> I assume the code is bailing on this conditional: >>>>>>>>> >>>>>>>>> if (NAPI_GRO_CB(skb)->encap_mark || >>>>>>>>> (skb->ip_summed != CHECKSUM_PARTIAL && >>>>>>>>> NAPI_GRO_CB(skb)->csum_cnt == 0 && >>>>>>>>> !NAPI_GRO_CB(skb)->csum_valid) || >>>>>>>>> !udp_sk(sk)->gro_receive) >>>>>>>>> goto out_unlock; >>>>>>>>> >>>>>>>>> I am trying to remember why this needs to check csum_cnt. If there was >>>>>>>>> a csum_cnt for the UDP csum being zero from checksum-unnecessary, it >>>>>>>>> was consumed by skb_gro_checksum_validate_zero_check in UDP4 GRO >>>>>>>>> received. >>>>>>>> >>>>>>>> We have met the scene about two VMs in different host with vxlan >>>>>>>> packets, when udp4_gro_receive receives >>>>>>>> one packet with ip_summed=CHECKSUM_NONE,csum_cnt=0,csum_valid=0,and >>>>>>>> udp->check=0, then skb_gro_checksum_validate_zero_check()-> >>>>>>>> skb_gro_incr_csum_unnecessary() validate it and set >>>>>>>> ip_summed=CHECKSUM_UNNECESSARY,csum_level=0, but csum_cnt and >>>>>>>> csum_valid >>>>>>>> keep zero value. Then it will be flushed in udp_gro_receive(), the >>>>>>>> codes as you have showed. >>>>>>>> >>>>>>>> so I think it forgets to modify csum_cnt since csum_level is changed >>>>>>>> in skb_gro_incr_csum_unnecessary()->__skb_incr_checksum_unnecessary(). >>>>>>>> >>>>>>> Yes, but the csum_level is changing since we've gone beyond the >>>>>>> checksums initially reported inc checksum-unnecessary. GRO csum_cnt is >>>>>>> initialized to skb->csum_level + 1 at the start of GRO processing. >>>>>>> >>>>>>> If I recall, the rule is that UDP GRO requires at least one non-zero >>>>>>> checksum to be verified. The idea is that if we end up computing >>>>>>> packet checksums on the host for inner checksums like TCP during GRO, >>>>>>> then that's negating the performance benefits of GRO. Had UDP check >>>>>>> not been zero then we would do checksum unnecessary conversion and so >>>>>>> csum_valid would be set for the remainded of GRO processing. The >>>>>>> existing code is following the rule I believe, so this may be working >>>>>>> as intended. >>>>>> >>>>>> Do you have any suggestion if I need do GRO as udp->check is zero? >>>>>> My previous modification which works fine as below: >>>>>> if (NAPI_GRO_CB(skb)->encap_mark || >>>>>> (skb->ip_summed != CHECKSUM_PARTIAL && >>>>>> + skb->ip_summed != CHECKSUM_UNNECESSARY && >>>>> >>>>> That's effectively disabling the rule that we need a real checksum >>>>> calculation to proceed with GRO. Besides that, the device returning >>>>> one checksum-unnecessary level because UDP csum is zero is pretty >>>>> pointelss; we can just as easily deduce get to same state just by >>>>> looking at the field with CHECKSUM_NONE. What we really want to see >>>>> for GRO is a real checksum computation being done on the packet. >>>>> >>>>> A few questions: >>>>> >>>>> What type of packets are being GROed? Are these TCP? What performance >>>>> difference do you see with our patch? Can you try enabling UDP >>>>> checksums, and even RCO with VXLAN? With UDP encapsulation we >>>>> generally see better performance with checksum enabled since UDP >>>>> checksum offload is ubiquitous and we can easily convert >>>>> checksum-unnecessary (with non-zero csum) to checksum-complete. >>>> >>>> We use the physical network card calculate the checksum of the inner >>>> packet with checksum offload. >>>> Set the udp checksum of the vxlan header is 0. >>>> >>> I see. It sounds like the device is really verifying two checksums in >>> the packet, the outer UDP checksum (which is zero for UDP) and an >>> inner checksum, but only reporting one checksum was verified. The >>> driver needs to set csum_level to 1 in this case (meaning two >>> checksums have been verified for checksum-unnecessary). What NIC are >>> you using? >> >> Currently it is 82599, whose driver can't recognize the vxlan packet. >> I guess so many NICs can't do this checking in it's driver, so I think this >> is a >> common case, will we fix it in stack? >> > Mao, > > The problem isn't in the stack, it seems to be in the driver. If the > device reports a verified checksum for an encpasulated packet then the > driver needs to set csum_level to 1. Otherwise, the stack can't just > assume that the inner checksum was verified. *How* a driver deduces > that the device is reporting about an encapsulated checksum is > specific to the device and its driver. I'm not sure which driver your > running, but if you search the code there should be something like > "skb->csum_level =1" that would be a clue about support. A good > example is ixgbe, if the device reports checksum verified and that > packet was VXLAN, it deduces that the inner checksum was verified and > so the driver sets CHECKSUM_UNNECESSARY and skb->csum_level. Of course > all this complexity goes away when devices just provide > checksum-complete. > > Tom Hi Tom, I have checked code in ixgbe_main.c ixgbe_rx_checksum(). UDP frames with a 0 checksum can be marked as checksum errors. If it returns here, skb->ip_summed will be set to CHECKSUM_NONE. Then the packet will be flush to stack in udp_gro_receive(). Do you think whether it is the fault of driver or not? ixgbe_rx_checksum(): if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_ERR_TCPE)) { /* * 82599 errata, UDP frames with a 0 checksum can be marked as * checksum errors. */ if ((pkt_info & cpu_to_le16(IXGBE_RXDADV_PKTTYPE_UDP)) && test_bit(__IXGBE_RX_CSUM_UDP_ZERO_ERR, &ring->state)) return; ring->rx_stats.csum_err++; return; } udp_gro_receive(): if (NAPI_GRO_CB(skb)->encap_mark || (skb->ip_summed != CHECKSUM_PARTIAL && NAPI_GRO_CB(skb)->csum_cnt == 0 && !NAPI_GRO_CB(skb)->csum_valid) || !udp_sk(sk)->gro_receive) goto out_unlock; > >>> >>> Tom >>> >>> >>>> With this patch, the bandwidth of TCP between two VMs increase from >>>> 2Gbit/s to 6Gbit/s. >>>> >>>>> >>>>> Tom >>>>> >>>>> >>>>> >>>>> >>>>>> NAPI_GRO_CB(skb)->csum_cnt == 0 && >>>>>> !NAPI_GRO_CB(skb)->csum_valid) || >>>>>> !udp_sk(sk)->gro_receive) >>>>>> goto out_unlock; >>>>>> >>>>>> >>>>>>> >>>>>>> Tom >>>>>>> >>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> . >>>>>>> >>>>>> >>>>> >>>>> . >>>>> >>>> >>> >>> . >>> >> > > . >