On 27/01/21, Willem de Bruijn wrote: > On Wed, Jan 27, 2021 at 9:53 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > On Tue, Jan 26, 2021 at 10:25 PM Willem de Bruijn > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn > > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > > > On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn > > > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > > > > > On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute > > > > > <oliver.gra...@gmail.com> wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > we observe some unexpected behavior in the UDP implementation of the > > > > > > linux kernel. > > > > > > > > > > > > Some UDP packets send via the loopback interface are dropped in the > > > > > > kernel on the receive side when using sendto with the MSG_MORE flag. > > > > > > Every drop increases the InCsumErrors in /proc/self/net/snmp. Some > > > > > > example code to reproduce it is appended below. > > > > > > > > > > > > In the code we tracked it down to this code section. ( Even a little > > > > > > further but its unclear to me wy the csum() is wrong in the bad > > > > > > case) > > > > > > > > > > > > udpv6_recvmsg() > > > > > > ... > > > > > > if (checksum_valid || udp_skb_csum_unnecessary(skb)) { > > > > > > if (udp_skb_is_linear(skb)) > > > > > > err = copy_linear_skb(skb, copied, off, > > > > > > &msg->msg_iter); > > > > > > else > > > > > > err = skb_copy_datagram_msg(skb, off, msg, > > > > > > copied); > > > > > > } else { > > > > > > err = skb_copy_and_csum_datagram_msg(skb, off, msg); > > > > > > if (err == -EINVAL) { > > > > > > goto csum_copy_err; > > > > > > } > > > > > > } > > > > > > ... > > > > > > > > > > > > > > > > Thanks for the report with a full reproducer. > > > > > > > > > > I don't have a full answer yet, but can reproduce this easily. > > > > > > > > > > The third program, without MSG_MORE, builds an skb with > > > > > CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path > > > > > that ip_summed means no additional validation is needed. As encoded in > > > > > skb_csum_unnecessary. > > > > > > > > > > The first and second programs are essentially the same, bar for a > > > > > slight difference in length. In both cases packet length is very short > > > > > compared to the loopback device MTU. Because of MSG_MORE, these > > > > > packets have CHECKSUM_NONE. > > > > > > > > > > On receive in > > > > > > > > > > __udp4_lib_rcv() > > > > > udp4_csum_init() > > > > > err = skb_checksum_init_zero_check() > > > > > > > > > > The second program validates and sets ip_summed = CHECKSUM_COMPLETE > > > > > and csum_valid = 1. > > > > > The first does not, though err == 0. > > > > > > > > > > This appears to succeed consistently for packets <= 68B of payload, > > > > > fail consistently otherwise. It is not clear to me yet what causes > > > > > this distinction. > > > > > > > > This is from > > > > > > > > " > > > > /* For small packets <= CHECKSUM_BREAK perform checksum complete > > > > directly > > > > * in checksum_init. > > > > */ > > > > #define CHECKSUM_BREAK 76 > > > > " > > > > > > > > So the small packet gets checksummed immediately in > > > > __skb_checksum_validate_complete, but the larger one does not. > > > > > > > > Question is why the copy_and_checksum you pointed to seems to fail > > > > checksum. > > > > > > Manually calling __skb_checksum_complete(skb) in > > > skb_copy_and_csum_datagram_msg succeeds, so it is the > > > skb_copy_and_csum_datagram that returns an incorrect csum. > > > > > > Bisection shows that this is a regression in 5.0, between > > > > > > 65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper > > > (fail) > > > d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper > > > 950fcaecd5cc datagram: consolidate datagram copy to iter helpers > > > cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter > > > (pass) > > > > > > That's a significant amount of code change. I'll take a closer look, > > > but checkpointing state for now.. > > > > Key difference is the csum_block_add when handling frags, and the > > removal of temporary csum2. > > > > In the reproducer, there is one 13B csum_and_copy_to_iter from > > skb->data + offset, followed by a 73B csum_and_copy_to_iter from the > > first frag. So the second one passes pos 13 to csum_block_add. > > > > The original implementation of skb_copy_and_csum_datagram similarly > > fails the test, if we fail to account for the position > > > > - *csump = csum_block_add(*csump, csum2, pos); > > + *csump = csum_block_add(*csump, csum2, 0); > > One possible approach:
very thx for your analysis and your patch proposal. After a first quick test this patch proposal solves the problem. Best regards, Oliver