On Thu, Nov 15, 2018 at 03:16:02PM -0800, Cong Wang wrote: > The following evidences indicate this check is likely wrong: > > 1. In the assignment "skb->csum_valid = !sum", sum==0 indicates a valid > checksum. > > 2. __skb_checksum_complete() always returns sum, and TCP packets are dropped > only when it returns non-zero. So non-zero indicates a failure. > > 3. In __skb_checksum_validate_complete(), we have a nearly same check, where > zero is considered as success. > > 4. csum_fold() already does the one’s complement, this indicates 0 should > be considered as a successful validation. > > 5. We have triggered this fault for many times, but InCsumErrors field in > /proc/net/snmp remains 0. > > Base on the above, non-zero should be used as a checksum mismatch. > > I tested this with mlx5 driver, no warning or InCsumErrors after 1 hour. > > Fixes: fb286bb2990a ("[NET]: Detect hardware rx checksum faults correctly") > Cc: Herbert Xu <herb...@gondor.apana.org.au> > Cc: Tom Herbert <t...@herbertland.com> > Cc: Eric Dumazet <eduma...@google.com> > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> > --- > net/core/datagram.c | 4 ++-- > net/core/dev.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/core/datagram.c b/net/core/datagram.c > index 57f3a6fcfc1e..e542a9a212a7 100644 > --- a/net/core/datagram.c > +++ b/net/core/datagram.c > @@ -733,7 +733,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, > int len) > __sum16 sum; > > sum = csum_fold(skb_checksum(skb, 0, len, skb->csum)); > - if (likely(!sum)) { > + if (unlikely(sum)) { > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && > !skb->csum_complete_sw) > netdev_rx_csum_fault(skb->dev);
Normally if the hardware's partial checksum is valid then we just trust it and send the packet along. However, if the partial checksum is invalid we don't trust it and we will compute the whole checksum manually which is what ends up in sum. netdev_rx_csum_fault is meant to warn about the situation where a packet with a valid checksum (i.e., sum == 0) was given to us by the hardware with a partial checksum that was invalid. So changing it to sum here is wrong. Can you give more information as to how you got the warnings with mlx5? It sounds like there may be a real bug there because if you are getting the warning then it means that a packet with an invalid hardware-computed partial checksum passed the manual check and was actually valid. This implies that either the hardware or the driver is broken. Cheers, -- Email: Herbert Xu <herb...@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt