Thanks Manish! I am building a test kernel now, and I will let you know
once it is ready to test.

If we get good test results, I will submit the patch for SRU to the
Ubuntu kernels once the patch has hit mainline.

** Description changed:

- With QL41xxx and Ubuntu DNS server DNS failures are seen when updated to
- the latest Ubuntu kernel 20.04.1 LTS version 5.4.0-52-generic. Issue was
- not observed with 4.5 ubuntu-linux.
+ BugLink: https://bugs.launchpad.net/bugs/1909062
  
- Problem Definition:
- OS Version: /etc/os-release shows Ubuntu 18.04.4 LTS, but Booted kernel is 
the latest Ubuntu 20.04.1 LTS version 5.4.0-52-generic
- NIC: 2 dual-port (4) QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE 
Controller [1077:8070] (rev 02)
- Inbox driver qede v8.37.0.20
+ [Impact]
  
- Complete Detailed Problem Description:
- Anything that uses the internal Kubernetes DNS server fails. If an external 
DNS server is used resolution works for non-Kubernetes IPs.
+ For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000
+ Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel
+ to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to
+ these packets getting corrupted.
  
- Similar issue is described in this article.
- https://github.com/kubernetes/kubernetes/issues/95365
+ Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and
+ this particular packet type is not supported for hardware tx checksum
+ offload, and the packets end up corrupted when the qede driver attempts
+ to checksum them.
  
- Below patch recently on upstream fixes this -
- [Note that issue was introduced by driver's tunnel offload support which was 
added in after 4.5 kernel]
+ This only affects internal Kubernetes DNS, as regular DNS lookups to
+ regular external domains will succeed, due to them not using IPIP packet
+ types.
+ 
+ [Fix]
+ 
+ Marvell has developed a fix for the qede driver, which checks the packet
+ type, and if it is IPPROTO_IPIP, then csum offloads are disabled for
+ socket buffers of type IPIP.
  
  commit 5d5647dad259bb416fd5d3d87012760386d97530
  Author: Manish Chopra <mani...@marvell.com>
- Date:   Mon Dec 21 06:55:30 2020 -0800
+ Date: Mon Dec 21 06:55:30 2020 -0800
+ Subject: qede: fix offload for IPIP tunnel packets
+ Link: 
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5d5647dad259bb416fd5d3d87012760386d97530
  
-     qede: fix offload for IPIP tunnel packets
+ This commit is currently in the netdev tree, awaiting merge to mainline.
+ The commit is queued for upstream stable.
  
-     IPIP tunnels packets are unknown to device,
-     hence these packets are incorrectly parsed and
-     caused the packet corruption, so disable offlods
-     for such packets at run time.
+ [Testcase]
  
-     Signed-off-by: Manish Chopra <mani...@marvell.com>
-     Signed-off-by: Sudarsana Kalluru <skall...@marvell.com>
-     Signed-off-by: Igor Russkikh <irussk...@marvell.com>
-     Link: https://lore.kernel.org/r/20201221145530.7771-1-mani...@marvell.com
-     Signed-off-by: Jakub Kicinski <k...@kernel.org>
+ The system must have a QLogic QL41xxx series NIC fitted, and needs to be
+ a part of a Kubernetes cluster.
  
- Thanks,
- Manish
+ Firstly, get a list of all devices in the system:
+ 
+ $ sudo ifconfig
+ 
+ Next, set all devices down with:
+ 
+ $ sudo ifconfig <device> down
+ 
+ Next, bring up the QLogic QL41xxx device:
+ 
+ $ sudo ifconfig <qlogic nic device> up
+ 
+ Then, attempt to lookup an internal Kubernetes domain:
+ 
+ $ nslookup <internal kubernetes domain address>
+ 
+ Without the patch, the connection will time out:
+ 
+ ;; connection timed out; no servers could be reached
+ 
+ If we look at packet traces with tcpdump, we see it leaves the source,
+ but never arrives at the destination.
+ 
+ There is a test kernel available in the following ppa:
+ 
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
+ 
+ If you install it, then Kubernetes internal DNS lookups will succeed.
+ 
+ [Where problems could occur]
+ 
+ If a regression were to occur, then users of the qede driver would be
+ affected. This is limited to those with QLogic QL41xxx series NICs. The
+ patch explicitly checks for IPIP type packets, so only those particular
+ packets would be affected.
+ 
+ Since IPIP type packets are uncommon, it would not cause a total outage
+ on regression, since most packets are not IPIP tunnelled. It could
+ potentially cause problems for users who frequently handle VPN or
+ Kubernetes internal DNS traffic.
+ 
+ A workaround would be to use ethtool to disable tx csum offload for all
+ packet types, or to revert to an older kernel.

** Tags added: sts

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1909062

Title:
  qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not
  supporting IPIP tx csum offload

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909062/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to