On Wed, Jan 27, 2021 at 9:53 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Tue, Jan 26, 2021 at 10:25 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn > > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > > > On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute <oliver.gra...@gmail.com> > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > we observe some unexpected behavior in the UDP implementation of the > > > > > linux kernel. > > > > > > > > > > Some UDP packets send via the loopback interface are dropped in the > > > > > kernel on the receive side when using sendto with the MSG_MORE flag. > > > > > Every drop increases the InCsumErrors in /proc/self/net/snmp. Some > > > > > example code to reproduce it is appended below. > > > > > > > > > > In the code we tracked it down to this code section. ( Even a little > > > > > further but its unclear to me wy the csum() is wrong in the bad case) > > > > > > > > > > udpv6_recvmsg() > > > > > ... > > > > > if (checksum_valid || udp_skb_csum_unnecessary(skb)) { > > > > > if (udp_skb_is_linear(skb)) > > > > > err = copy_linear_skb(skb, copied, off, > > > > > &msg->msg_iter); > > > > > else > > > > > err = skb_copy_datagram_msg(skb, off, msg, > > > > > copied); > > > > > } else { > > > > > err = skb_copy_and_csum_datagram_msg(skb, off, msg); > > > > > if (err == -EINVAL) { > > > > > goto csum_copy_err; > > > > > } > > > > > } > > > > > ... > > > > > > > > > > > > > Thanks for the report with a full reproducer. > > > > > > > > I don't have a full answer yet, but can reproduce this easily. > > > > > > > > The third program, without MSG_MORE, builds an skb with > > > > CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path > > > > that ip_summed means no additional validation is needed. As encoded in > > > > skb_csum_unnecessary. > > > > > > > > The first and second programs are essentially the same, bar for a > > > > slight difference in length. In both cases packet length is very short > > > > compared to the loopback device MTU. Because of MSG_MORE, these > > > > packets have CHECKSUM_NONE. > > > > > > > > On receive in > > > > > > > > __udp4_lib_rcv() > > > > udp4_csum_init() > > > > err = skb_checksum_init_zero_check() > > > > > > > > The second program validates and sets ip_summed = CHECKSUM_COMPLETE > > > > and csum_valid = 1. > > > > The first does not, though err == 0. > > > > > > > > This appears to succeed consistently for packets <= 68B of payload, > > > > fail consistently otherwise. It is not clear to me yet what causes > > > > this distinction. > > > > > > This is from > > > > > > " > > > /* For small packets <= CHECKSUM_BREAK perform checksum complete directly > > > * in checksum_init. > > > */ > > > #define CHECKSUM_BREAK 76 > > > " > > > > > > So the small packet gets checksummed immediately in > > > __skb_checksum_validate_complete, but the larger one does not. > > > > > > Question is why the copy_and_checksum you pointed to seems to fail > > > checksum. > > > > Manually calling __skb_checksum_complete(skb) in > > skb_copy_and_csum_datagram_msg succeeds, so it is the > > skb_copy_and_csum_datagram that returns an incorrect csum. > > > > Bisection shows that this is a regression in 5.0, between > > > > 65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper > > (fail) > > d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper > > 950fcaecd5cc datagram: consolidate datagram copy to iter helpers > > cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter > > (pass) > > > > That's a significant amount of code change. I'll take a closer look, > > but checkpointing state for now.. > > Key difference is the csum_block_add when handling frags, and the > removal of temporary csum2. > > In the reproducer, there is one 13B csum_and_copy_to_iter from > skb->data + offset, followed by a 73B csum_and_copy_to_iter from the > first frag. So the second one passes pos 13 to csum_block_add. > > The original implementation of skb_copy_and_csum_datagram similarly > fails the test, if we fail to account for the position > > - *csump = csum_block_add(*csump, csum2, pos); > + *csump = csum_block_add(*csump, csum2, 0);
One possible approach: diff --git a/net/core/datagram.c b/net/core/datagram.c index 81809fa735a7..56bd0c32fa65 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -721,8 +721,10 @@ static int skb_copy_and_csum_datagram(const struct sk_buff *skb, int offset, struct iov_iter *to, int len, __wsum *csump) { + struct csum_iter csdata = { .csump = csump }; + return __skb_datagram_iter(skb, offset, to, len, true, - csum_and_copy_to_iter, csump); + csum_and_copy_to_iter, &csdata); } diff --git a/include/linux/uio.h b/include/linux/uio.h index 72d88566694e..7ce4f403a03f 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -260,7 +260,12 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) { i->count = count; } -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i); + +struct csum_iter { + void *csump; + int off; +}; +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csdata, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index a21e6a5792c5..2e6e24f7dfe1 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1522,13 +1522,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, } EXPORT_SYMBOL(csum_and_copy_from_iter_full); -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csdata, struct iov_iter *i) { + struct csum_iter *csdata = _csdata; const char *from = addr; - __wsum *csum = csump; + __wsum *csum = csdata->csump; __wsum sum, next; - size_t off = 0; if (unlikely(iov_iter_is_pipe(i))) return csum_and_copy_to_pipe_iter(addr, bytes, csum, i); @@ -1543,22 +1543,22 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, v.iov_base, v.iov_len); if (next) { - sum = csum_block_add(sum, next, off); - off += v.iov_len; + sum = csum_block_add(sum, next, csdata->off); + csdata->off += v.iov_len; } next ? 0 : v.iov_len; }), ({ char *p = kmap_atomic(v.bv_page); sum = csum_and_memcpy(p + v.bv_offset, (from += v.bv_len) - v.bv_len, - v.bv_len, sum, off); + v.bv_len, sum, csdata->off); kunmap_atomic(p); - off += v.bv_len; + csdata->off += v.bv_len; }),({ sum = csum_and_memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, - v.iov_len, sum, off); - off += v.iov_len; + v.iov_len, sum, csdata->off); + csdata->off += v.iov_len; }) ) *csum = sum;