On Tue, Jan 26, 2021 at 10:25 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Tue, Jan 26, 2021 at 5:00 PM Willem de Bruijn > <willemdebruijn.ker...@gmail.com> wrote: > > > > On Tue, Jan 26, 2021 at 4:54 PM Willem de Bruijn > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > On Tue, Jan 26, 2021 at 9:58 AM Oliver Graute <oliver.gra...@gmail.com> > > > wrote: > > > > > > > > Hello, > > > > > > > > we observe some unexpected behavior in the UDP implementation of the > > > > linux kernel. > > > > > > > > Some UDP packets send via the loopback interface are dropped in the > > > > kernel on the receive side when using sendto with the MSG_MORE flag. > > > > Every drop increases the InCsumErrors in /proc/self/net/snmp. Some > > > > example code to reproduce it is appended below. > > > > > > > > In the code we tracked it down to this code section. ( Even a little > > > > further but its unclear to me wy the csum() is wrong in the bad case) > > > > > > > > udpv6_recvmsg() > > > > ... > > > > if (checksum_valid || udp_skb_csum_unnecessary(skb)) { > > > > if (udp_skb_is_linear(skb)) > > > > err = copy_linear_skb(skb, copied, off, > > > > &msg->msg_iter); > > > > else > > > > err = skb_copy_datagram_msg(skb, off, msg, > > > > copied); > > > > } else { > > > > err = skb_copy_and_csum_datagram_msg(skb, off, msg); > > > > if (err == -EINVAL) { > > > > goto csum_copy_err; > > > > } > > > > } > > > > ... > > > > > > > > > > Thanks for the report with a full reproducer. > > > > > > I don't have a full answer yet, but can reproduce this easily. > > > > > > The third program, without MSG_MORE, builds an skb with > > > CHECKSUM_PARTIAL in __ip_append_data. When looped to the receive path > > > that ip_summed means no additional validation is needed. As encoded in > > > skb_csum_unnecessary. > > > > > > The first and second programs are essentially the same, bar for a > > > slight difference in length. In both cases packet length is very short > > > compared to the loopback device MTU. Because of MSG_MORE, these > > > packets have CHECKSUM_NONE. > > > > > > On receive in > > > > > > __udp4_lib_rcv() > > > udp4_csum_init() > > > err = skb_checksum_init_zero_check() > > > > > > The second program validates and sets ip_summed = CHECKSUM_COMPLETE > > > and csum_valid = 1. > > > The first does not, though err == 0. > > > > > > This appears to succeed consistently for packets <= 68B of payload, > > > fail consistently otherwise. It is not clear to me yet what causes > > > this distinction. > > > > This is from > > > > " > > /* For small packets <= CHECKSUM_BREAK perform checksum complete directly > > * in checksum_init. > > */ > > #define CHECKSUM_BREAK 76 > > " > > > > So the small packet gets checksummed immediately in > > __skb_checksum_validate_complete, but the larger one does not. > > > > Question is why the copy_and_checksum you pointed to seems to fail checksum. > > Manually calling __skb_checksum_complete(skb) in > skb_copy_and_csum_datagram_msg succeeds, so it is the > skb_copy_and_csum_datagram that returns an incorrect csum. > > Bisection shows that this is a regression in 5.0, between > > 65d69e2505bb datagram: introduce skb_copy_and_hash_datagram_iter helper (fail) > d05f443554b3 iov_iter: introduce hash_and_copy_to_iter helper > 950fcaecd5cc datagram: consolidate datagram copy to iter helpers > cb002d074dab iov_iter: pass void csum pointer to csum_and_copy_to_iter (pass) > > That's a significant amount of code change. I'll take a closer look, > but checkpointing state for now..
Key difference is the csum_block_add when handling frags, and the removal of temporary csum2. In the reproducer, there is one 13B csum_and_copy_to_iter from skb->data + offset, followed by a 73B csum_and_copy_to_iter from the first frag. So the second one passes pos 13 to csum_block_add. The original implementation of skb_copy_and_csum_datagram similarly fails the test, if we fail to account for the position - *csump = csum_block_add(*csump, csum2, pos); + *csump = csum_block_add(*csump, csum2, 0);