This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1832082 Title: bnx2x driver causes 100% CPU load Status in linux package in Ubuntu: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Won't Fix Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Status in linux source package in FF-Series: Fix Committed Bug description: [Impact] * The PTP feature in bnx2x driver is implemented in a way that if the NIC firmware takes some time to perform the timestamping - which is observed as a bad register read in bnx2x_ptp_task() - then the ptp worker function will reschedule itself indefinitely until the value read from the register is meaningful. With that behavior, if an userspace tool request a bad configured RX filter to bnx2x (or if NIC firmware has any other issue in timestamping), the function bnx2x_ptp_task() will be rescheduled forever and cause a unbound resource consumption. This manifests as a kworker thread consuming 100% of CPU. * The dmesg log will show the following message regarding other packets being skipped on timestamp routine due to a packet getting stuck in the timestamping "pipeline": "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single outstanding packet to timestamp, this packet will not be timestamped" Also, by using ftrace user can notice that function bnx2x_ptp_task() is being called a lot, and by enabling bnx2x PTP debugging log (ethtool -s <iface> msglvl 16777216) it's possible to observe the following message flooding the kernel log: "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet" * The patch proposed in this SRU request is accepted upstream and is available currently (2019-07-03) in David Miller's linux-net tree: git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72 Besides fixing the issue, it also adds an ethtool statistics for accounting the ptp errors and reduces message flooding in case of errors. [Test case] Reproducing the problem is not difficult; we've used chrony in Bionic to trigger the problem. The steps are: a) Install chrony on Bionic in a system with working NIC managed by bnx2x; b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf file; c) Restart chrony service Check dmesg for the "[...]single outstanding packet" message and the overall CPU workload using a tool like "top" to observe a kthread consuming 100% of CPU. [Regression potential] The patch scope is restricted to bnx2x ptp handler, and was validated by the driver maintainer. If there's any possibility of regressions, we believe the worst would be an issue affecting the packet timestamping, not messing with the regular xmit path for the driver. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp