The problem no longer seems to be occurring anymore even with the
4.10.0-33-generic kernel.  Since 2017-09-11 all of the affected machines
have been getting automatic package updates and I'm guessing some non-
kernel package has changed something that prevents the problem.

I would like to know what fixed it and I could start reverting those
package updates or start over with an old install, but for now as long
as it is fixed it might not be worth investigating further.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713751

Title:
  soft lockup / stall on CPU when shutting down with hwe 4.10 kernel

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Zesty:
  Incomplete
Status in linux source package in Artful:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  Instead of normal complete shutdowns we're getting soft lockup
  failures. This started when 16.04 hwe packages switched to the 4.10
  kernel about a month ago. I help manage a few hundred machines
  spanning several different sites and several different hardware models
  and they're all experiencing this intermittently, approximately 5% get
  stuck on shutdown each day.

  Here is an example of what is on the screen after it happens, the
  machine is unresponsive and requires a hard reset.  I can't see
  anything in syslog or dmesg that differs when this happens, I think
  all logging has stopped at this point in the shutdown.

  [54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288)
  [54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54746.232003] INFO: rcu_sched self-detected stall on CPU
  [54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0 
softirq=218389/218389 fqs=3247712

  This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22: 
  ... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 

  
  Reverting to 4.8.0-58 avoids the problem. I believe the problem has been 
present with every hwe 4.10 kernel package through the current 
linux-image-4.10.0-33-generic.  This bug was filed with data right after it 
occurred with linux-image-4.10.0-33-generic.

  This only happens approximately 5% of the time with no discernible
  pattern.  I am able to reproduce the issue on one particular machine
  by scheduling shutdowns 3 times per day and waiting up to a few days
  for the problem to occur. Shutting down and starting up more
  frequently, like every 5 minutes or even an hour, will not trigger the
  problem, it seems like the machine needs to be running for a while.
  It does not seem to depend on any user actions, it happens even if you
  never login.  It has happened on reboots as as opposed to shutdowns as
  well.   I found a few similar bug reports but nothing for these exact
  symptoms.

  I have tried blacklisting mei_me with no change in behavior.  I'm not
  sure but the majority of the affected machines are using intel video
  chips.  Next I am going to try a mainline 4.10 kernel.

  
  lsb_release -rd
  Description:  Ubuntu 16.04.3 LTS
  Release:      16.04

  
  apt-cache policy linux-image-4.10.0-33-generic
  linux-image-4.10.0-33-generic:
    Installed: 4.10.0-33.37~16.04.1
    Candidate: 4.10.0-33.37~16.04.1
    Version table:
   *** 4.10.0-33.37~16.04.1 500
          500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64 
Packages
          100 /var/lib/dpkg/status

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1
  ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17
  Uname: Linux 4.10.0-33-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: amd64
  CurrentDesktop: XFCE
  Date: Tue Aug 29 08:57:26 2017
  SourcePackage: linux-hwe
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713751/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to