The problem no longer seems to be occurring anymore even with the 4.10.0-33-generic kernel. Since 2017-09-11 all of the affected machines have been getting automatic package updates and I'm guessing some non- kernel package has changed something that prevents the problem.
I would like to know what fixed it and I could start reverting those package updates or start over with an old install, but for now as long as it is fixed it might not be worth investigating further. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713751 Title: soft lockup / stall on CPU when shutting down with hwe 4.10 kernel Status in linux package in Ubuntu: In Progress Status in linux source package in Zesty: Incomplete Status in linux source package in Artful: In Progress Status in linux source package in Bionic: In Progress Bug description: Instead of normal complete shutdowns we're getting soft lockup failures. This started when 16.04 hwe packages switched to the 4.10 kernel about a month ago. I help manage a few hundred machines spanning several different sites and several different hardware models and they're all experiencing this intermittently, approximately 5% get stuck on shutdown each day. Here is an example of what is on the screen after it happens, the machine is unresponsive and requires a hard reset. I can't see anything in syslog or dmesg that differs when this happens, I think all logging has stopped at this point in the shutdown. [54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288) [54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! (systemd:1) [54746.232003] INFO: rcu_sched self-detected stall on CPU [54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0 softirq=218389/218389 fqs=3247712 This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22: ... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! Reverting to 4.8.0-58 avoids the problem. I believe the problem has been present with every hwe 4.10 kernel package through the current linux-image-4.10.0-33-generic. This bug was filed with data right after it occurred with linux-image-4.10.0-33-generic. This only happens approximately 5% of the time with no discernible pattern. I am able to reproduce the issue on one particular machine by scheduling shutdowns 3 times per day and waiting up to a few days for the problem to occur. Shutting down and starting up more frequently, like every 5 minutes or even an hour, will not trigger the problem, it seems like the machine needs to be running for a while. It does not seem to depend on any user actions, it happens even if you never login. It has happened on reboots as as opposed to shutdowns as well. I found a few similar bug reports but nothing for these exact symptoms. I have tried blacklisting mei_me with no change in behavior. I'm not sure but the majority of the affected machines are using intel video chips. Next I am going to try a mainline 4.10 kernel. lsb_release -rd Description: Ubuntu 16.04.3 LTS Release: 16.04 apt-cache policy linux-image-4.10.0-33-generic linux-image-4.10.0-33-generic: Installed: 4.10.0-33.37~16.04.1 Candidate: 4.10.0-33.37~16.04.1 Version table: *** 4.10.0-33.37~16.04.1 500 500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages 100 /var/lib/dpkg/status ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1 ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17 Uname: Linux 4.10.0-33-generic x86_64 ApportVersion: 2.20.1-0ubuntu2.10 Architecture: amd64 CurrentDesktop: XFCE Date: Tue Aug 29 08:57:26 2017 SourcePackage: linux-hwe UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713751/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp