Evan and I have demonstrated this bug on Kubernetes as well, so it's not just a problem in Mesos. (See https://github.com/kubernetes/kubernetes/issues/18898)
Sorry about my email client, I've re-sent the patch in another thread from git-email as I should have initially. I'll read through the TX path again to see if we missed something, but I'd love input from anyone else! -- Vijay Pandurangan https://www.twitter.com/vijayp http://www.vijayp.ca On Fri, Dec 18, 2015 at 2:00 PM, Cong Wang <[email protected]> wrote: > (Cc'ing Eric B and Tom) > > On Fri, Dec 18, 2015 at 9:54 AM, Vijay Pandurangan <[email protected]> wrote: >> Packets that arrive from real hardware devices have ip_summed == >> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or >> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The >> current version of veth will replace CHECKSUM_NONE with >> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to >> a veth device to be delivered to the application. This caused applications >> at Twitter to receive corrupt data when network hardware was corrupting >> packets. > > Yeah, https://reviews.apache.org/r/41158/. > > This is because normally packets to a veth device are _only_ from its pair > device, Mesos network isolator redirects packets from a hardware interface > to veth, which violates this expectation. This is also why no one else sees > this bug. ;) > >> >> We believe this was added as an optimization to skip computing and >> verifying checksums for communication between containers. However, locally >> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as >> written does nothing for them. As far as we can tell, after removing this >> code, these packets are transmitted from one stack to another unmodified >> (tcpdump shows invalid checksums on both sides, as expected), and they are >> delivered correctly to applications. We didn’t test every possible network >> configuration, but we tried a few common ones such as bridging containers, >> using NAT between the host and a container, and routing from hardware >> devices to containers. We have effectively deployed this in production at >> Twitter (by disabling RX checksum offloading on veth devices). > > > I am wondering if there is any other CHECKSUM_NONE case in the tx > path we could miss here. Mesos case is too special not only because > it redirects packets from hardware to veth, but also because it moves > packets from RX path to TX path. > > Eric? Tom? > >> >> This code dates back to the first version of the driver, commit >> <e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I >> suspect this bug occurred mostly because the driver API has evolved >> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix >> packet checksumming") (in December 2010) fixed this for packets that get >> created locally and sent to hardware devices, by not changing >> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming >> in from hardware devices. >> >> Co-authored-by: Evan Jones <[email protected]> >> Signed-off-by: Evan Jones <[email protected]> >> Cc: Nicolas Dichtel <[email protected]> >> Cc: Phil Sutter <[email protected]> >> Cc: Toshiaki Makita <[email protected]> >> Cc: [email protected] >> Cc: [email protected] >> Signed-off-by: Vijay Pandurangan <[email protected]> > > Your patch looks good to me but your email client corrupts your patch, > so please resend. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
