Thanks @gpiccoli Doing the same for bionic, based on the same logic as comment #12
** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1855409 Title: qede driver causes 100% CPU load Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Invalid Status in linux source package in Bionic: Fix Committed Status in linux source package in Disco: Fix Released Status in linux source package in Eoan: Fix Released Status in linux source package in Focal: Fix Released Bug description: [Impact] * The PTP feature in qede driver is implemented in a way that if the NIC firmware takes some time to perform the timestamping then the PTP worker function will reschedule itself indefinitely until the value read from a device register is meaningful. With that behavior, if an userspace tool requests a bad configured TX/RX filter (or if NIC firmware has any other issue in timestamping), the function qede_ptp_task() will reschedule itself forever and cause an unbound resource consumption. This manifests as a kworker thread consuming 100% of CPU. * The dmesg log will show a message like this: "qede_ptp_tx_ts:533(eno3)]Timestamping in progress" Also, by using perf user can observe a stack like the following: - 44.76% 0.00% kworker/16:5 [kernel.kallsyms] ret_from_fork - kthread - 44.74% worker_thread - 44.57% process_one_work - 42.67% qede_ptp_task - 38.86% qed_ptp_hw_read_tx_ts qed_rd - 3.03% queue_work_on - 2.06% __queue_work - 0.68% get_work_pool - 0.61% radix_tree_lookup __radix_tree_lookup 0.50% set_work_pool_and_clear_pending * The patch proposed in this SRU request refactors the PTP worked in qede by adding a time limit, after which the task doesn't reschedule itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede: Handle infinite driver spinning for Tx timestamp.") http://git.kernel.org/linus/9adebac37e7d Besides fixing the issue, it also adds an ethtool statistics for accounting the PTP errors. [Test case] By using chrony in Bionic, the following steps will reproduce the issue: a) Install chrony on Bionic in a system with working NIC managed by qede; b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf file; c) Restart chrony service Check dmesg for the "[...]Timestamping in progress" message and the overall CPU workload using a tool like "top" to observe a kthread consuming 100% of CPU. [Regression potential] The patch scope is restricted to qede PTP handler, and is upstream for more than 7 months. If there's any possibility of regressions, the worst would be an issue affecting the packet timestamping, not messing with the regular xmit path of the driver. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp