Hi Shixiong, thanks for the report. At the moment the bisect sits here:
4d60b13f267d workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active() adc1b642f72f workqueue: Implement system-wide nr_active enforcement for unbound workqueues 929b7fbecbcc workqueue: Introduce struct wq_node_nr_active afd774d513f5 workqueue: RCU protect wq->dfl_pwq and implement accessors for it 31a8e16645d7 workqueue: Make wq_adjust_max_active() round-robin pwqs while activating e4bbec8ce062 workqueue: Move nr_active handling into helpers 865f7641cf47 workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work() a88074533304 workqueue: Factor out pwq_is_empty() 5d378b3d47e1 workqueue: Move pwq->max_active to wq->max_active eb182ba1f6cb workqueue.c: Increase workqueue name length a0fcae282d10 do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak fa1cbadd64bc UBUNTU: [Packaging] add Real-time Linux Analysis tool (rtla) to linux-tools 888e7c48a1ff UBUNTU: SAUCE: rtla: fix deb build 51c8aee42179 UBUNTU: [Packaging] provide a wrapper module for python-perf 48357b9b6d27 UBUNTU: [Packaging] enable perf python module 759436dbdae1 drm/amdgpu: respect the abmlevel module parameter value if it is set 716ec855fa62 drm/amd/display: add panel_power_savings sysfs entry to eDP connectors b4275c751289 UBUNTU: Start new release 782e3646d110 UBUNTU: [Packaging] update annotations scripts b43457da65e4 UBUNTU: [Packaging] update variants 80ebe4152d65 UBUNTU: [Packaging] drop getabis data 7fdb45c9bbbc (tag: Ubuntu-6.8.0-31.31, refs/bisect/good-7fdb45c9bbbc95a3300b4d8de3f751f4c05c98e2) UBUNTU: Ubuntu-6.8.0-31.31 these workqueue patches are definitely suspicious, but they are actually part of the v6.8.2 stable tree hence i'm a bit puzzled - before we move forward, may i ask you to: 1) test the lastest v6.8 Noble kernel from proposed: linux-generic | 6.8.0-58.60 | noble-proposed | amd64, arm64, armhf, ppc64el, s390x 2) test again Noble GA kernel: linux-generic | 6.8.0-31.31 | noble | amd64, arm64, armhf, ppc64el, s390x I want to double check we didn't overlook anything. Thank you. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2081685 Title: [Ubuntu 24.04-generic Kernel-6.8]Hard lockup on 8 Socket System, ThinkSystem SR950 V3. Status in linux package in Ubuntu: Confirmed Status in linux source package in Noble: In Progress Status in linux source package in Oracular: Confirmed Bug description: There is CPU hard Lockup detected under Ubuntu 24.04 LTS (kernel 6.8.0-38). see attachment"dmesg0723-Lockup-Ubuntu24.04.log" ubuntu@SR950V3:~$ cat /var/log/dmesg | grep -i lockup [ 15.241164] kernel: watchdog: Watchdog detected hard LOCKUP on cpu 124 [ 15.241164] kernel: ? watchdog_hardlockup_check+0x1cb/0x3b0 Besides, the issue does not occur on upstream kernel 6.8,6.9, 6.10, 6.11-rc*, then only ubuntu kernel issue. see attachment "dmesg0923-No-Lockup-Kernel 6-10.log". According to the dmesg log, the "hard lockup" is not a real lockup, Because many CPU try to get cache_disable_lock spin lock at the same time when kernel boot. And competition has occurred here. Every CPU's TLB will be flushed in the critical zone, the flushing TLB is a time-consuming operation, and there are so many CPUs, so the false "hard lockup" was detected by kernel. To avoid customer confuse, when Canonical do the fix? HW Config: ThinkSystem SR950 V3 CPU: 8* Intel(R) Xeon(R) Platinum 8490H 60 Core 3.5GHz MEM: 2TB = SK Hynix 356GB DDR5 4800MHz 3DS (2015.1GB) Raid: ThinkSystem RAID 940-8i 4GB Flash PCIe Gen4 12Gb Adapter Storage: Micron_7450_MTFDKBA960TFR *1 Samsung 30.7TB 24Gbps SAS 2.5" SSD NIC: ThinkSystem Intel X710-T4L 10GBASE-T 4-Port OCP Ethernet Adapter OS: ubuntu 24.04 LTS( kernel 6.8.0-38-generic) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081685/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp