Thanks Manish! I am building a test kernel now, and I will let you know once it is ready to test.
If we get good test results, I will submit the patch for SRU to the Ubuntu kernels once the patch has hit mainline. ** Description changed: - With QL41xxx and Ubuntu DNS server DNS failures are seen when updated to - the latest Ubuntu kernel 20.04.1 LTS version 5.4.0-52-generic. Issue was - not observed with 4.5 ubuntu-linux. + BugLink: https://bugs.launchpad.net/bugs/1909062 - Problem Definition: - OS Version: /etc/os-release shows Ubuntu 18.04.4 LTS, but Booted kernel is the latest Ubuntu 20.04.1 LTS version 5.4.0-52-generic - NIC: 2 dual-port (4) QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller [1077:8070] (rev 02) - Inbox driver qede v8.37.0.20 + [Impact] - Complete Detailed Problem Description: - Anything that uses the internal Kubernetes DNS server fails. If an external DNS server is used resolution works for non-Kubernetes IPs. + For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 + Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel + to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to + these packets getting corrupted. - Similar issue is described in this article. - https://github.com/kubernetes/kubernetes/issues/95365 + Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and + this particular packet type is not supported for hardware tx checksum + offload, and the packets end up corrupted when the qede driver attempts + to checksum them. - Below patch recently on upstream fixes this - - [Note that issue was introduced by driver's tunnel offload support which was added in after 4.5 kernel] + This only affects internal Kubernetes DNS, as regular DNS lookups to + regular external domains will succeed, due to them not using IPIP packet + types. + + [Fix] + + Marvell has developed a fix for the qede driver, which checks the packet + type, and if it is IPPROTO_IPIP, then csum offloads are disabled for + socket buffers of type IPIP. commit 5d5647dad259bb416fd5d3d87012760386d97530 Author: Manish Chopra <mani...@marvell.com> - Date: Mon Dec 21 06:55:30 2020 -0800 + Date: Mon Dec 21 06:55:30 2020 -0800 + Subject: qede: fix offload for IPIP tunnel packets + Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5d5647dad259bb416fd5d3d87012760386d97530 - qede: fix offload for IPIP tunnel packets + This commit is currently in the netdev tree, awaiting merge to mainline. + The commit is queued for upstream stable. - IPIP tunnels packets are unknown to device, - hence these packets are incorrectly parsed and - caused the packet corruption, so disable offlods - for such packets at run time. + [Testcase] - Signed-off-by: Manish Chopra <mani...@marvell.com> - Signed-off-by: Sudarsana Kalluru <skall...@marvell.com> - Signed-off-by: Igor Russkikh <irussk...@marvell.com> - Link: https://lore.kernel.org/r/20201221145530.7771-1-mani...@marvell.com - Signed-off-by: Jakub Kicinski <k...@kernel.org> + The system must have a QLogic QL41xxx series NIC fitted, and needs to be + a part of a Kubernetes cluster. - Thanks, - Manish + Firstly, get a list of all devices in the system: + + $ sudo ifconfig + + Next, set all devices down with: + + $ sudo ifconfig <device> down + + Next, bring up the QLogic QL41xxx device: + + $ sudo ifconfig <qlogic nic device> up + + Then, attempt to lookup an internal Kubernetes domain: + + $ nslookup <internal kubernetes domain address> + + Without the patch, the connection will time out: + + ;; connection timed out; no servers could be reached + + If we look at packet traces with tcpdump, we see it leaves the source, + but never arrives at the destination. + + There is a test kernel available in the following ppa: + + https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test + + If you install it, then Kubernetes internal DNS lookups will succeed. + + [Where problems could occur] + + If a regression were to occur, then users of the qede driver would be + affected. This is limited to those with QLogic QL41xxx series NICs. The + patch explicitly checks for IPIP type packets, so only those particular + packets would be affected. + + Since IPIP type packets are uncommon, it would not cause a total outage + on regression, since most packets are not IPIP tunnelled. It could + potentially cause problems for users who frequently handle VPN or + Kubernetes internal DNS traffic. + + A workaround would be to use ethtool to disable tx csum offload for all + packet types, or to revert to an older kernel. ** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1909062 Title: qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909062/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs