I'd like to perform a bisect to identify the Ubuntu specific commit that introduced this. First can you test the Zesty and Artful kernels to see if the bug has already been fixed in the newer releases? If it has, we can focus of finding the fix with a "Reverse" bisect.
The kernels can be downloaded from: Zesty: https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/13563574 Artful: https://launchpad.net/~canonical-kernel-security-team/+archive/ubuntu/ppa2/+build/13567624 To install the kernels, just be sure to install both the linux-image and linux-image-extra .deb packages. Thanks in advance! ** Changed in: linux-hwe (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Changed in: linux (Ubuntu) Status: New => In Progress ** Changed in: linux (Ubuntu) Assignee: (unassigned) => Joseph Salisbury (jsalisbury) ** Changed in: linux-hwe (Ubuntu) Assignee: (unassigned) => Joseph Salisbury (jsalisbury) ** Changed in: linux-hwe (Ubuntu) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713751 Title: soft lockup / stall on CPU when shutting down with hwe 4.10 kernel Status in linux package in Ubuntu: In Progress Status in linux-hwe package in Ubuntu: In Progress Bug description: Instead of normal complete shutdowns we're getting soft lockup failures. This started when 16.04 hwe packages switched to the 4.10 kernel about a month ago. I help manage a few hundred machines spanning several different sites and several different hardware models and they're all experiencing this intermittently, approximately 5% get stuck on shutdown each day. Here is an example of what is on the screen after it happens, the machine is unresponsive and requires a hard reset. I can't see anything in syslog or dmesg that differs when this happens, I think all logging has stopped at this point in the shutdown. [54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288) [54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54746.232003] INFO: rcu_sched self-detected stall on CPU [54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0 softirq=218389/218389 fqs=3247712 This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22: ... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! Reverting to 4.8.0-58 avoids the problem. I believe the problem has been present with every hwe 4.10 kernel package through the current linux-image-4.10.0-33-generic. This bug was filed with data right after it occurred with linux-image-4.10.0-33-generic. This only happens approximately 5% of the time with no discernible pattern. I am able to reproduce the issue on one particular machine by scheduling shutdowns 3 times per day and waiting up to a few days for the problem to occur. Shutting down and starting up more frequently, like every 5 minutes or even an hour, will not trigger the problem, it seems like the machine needs to be running for a while. It does not seem to depend on any user actions, it happens even if you never login. It has happened on reboots as as opposed to shutdowns as well. I found a few similar bug reports but nothing for these exact symptoms. I have tried blacklisting mei_me with no change in behavior. I'm not sure but the majority of the affected machines are using intel video chips. Next I am going to try a mainline 4.10 kernel. lsb_release -rd Description: Ubuntu 16.04.3 LTS Release: 16.04 apt-cache policy linux-image-4.10.0-33-generic linux-image-4.10.0-33-generic: Installed: 4.10.0-33.37~16.04.1 Candidate: 4.10.0-33.37~16.04.1 Version table: *** 4.10.0-33.37~16.04.1 500 500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages 100 /var/lib/dpkg/status ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1 ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17 Uname: Linux 4.10.0-33-generic x86_64 ApportVersion: 2.20.1-0ubuntu2.10 Architecture: amd64 CurrentDesktop: XFCE Date: Tue Aug 29 08:57:26 2017 SourcePackage: linux-hwe UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713751/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp