I'd like to perform a bisect to identify the Ubuntu specific commit that
introduced this.  First can you test the Zesty and Artful kernels to see
if the bug has already been fixed in the newer releases?  If it has, we
can focus of finding the fix with a "Reverse" bisect.

The kernels can be downloaded from:
Zesty: 
https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/13563574
Artful: 
https://launchpad.net/~canonical-kernel-security-team/+archive/ubuntu/ppa2/+build/13567624

To install the kernels, just be sure to install both the linux-image and
linux-image-extra .deb packages.

Thanks in advance!

** Changed in: linux-hwe (Ubuntu)
   Importance: Undecided => High

** Also affects: linux (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
       Status: New => In Progress

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux-hwe (Ubuntu)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Changed in: linux-hwe (Ubuntu)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1713751

Title:
  soft lockup / stall on CPU when shutting down with hwe 4.10 kernel

Status in linux package in Ubuntu:
  In Progress
Status in linux-hwe package in Ubuntu:
  In Progress

Bug description:
  Instead of normal complete shutdowns we're getting soft lockup
  failures. This started when 16.04 hwe packages switched to the 4.10
  kernel about a month ago. I help manage a few hundred machines
  spanning several different sites and several different hardware models
  and they're all experiencing this intermittently, approximately 5% get
  stuck on shutdown each day.

  Here is an example of what is on the screen after it happens, the
  machine is unresponsive and requires a hard reset.  I can't see
  anything in syslog or dmesg that differs when this happens, I think
  all logging has stopped at this point in the shutdown.

  [54566.220003] ? (t=6450529 jiffies g=141935 c=141934 q=1288)
  [54592.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54620.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54648.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54676.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54704.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54732.092003] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! 
(systemd:1)
  [54746.232003] INFO: rcu_sched self-detected stall on CPU
  [54746.232003] ?1-...: (6495431 ticks this GP) idle=5c7/140000000000001/0 
softirq=218389/218389 fqs=3247712

  This repeats every ~ 22 seconds, sometimes it is stuck for 23s instead of 22: 
  ... NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 

  
  Reverting to 4.8.0-58 avoids the problem. I believe the problem has been 
present with every hwe 4.10 kernel package through the current 
linux-image-4.10.0-33-generic.  This bug was filed with data right after it 
occurred with linux-image-4.10.0-33-generic.

  This only happens approximately 5% of the time with no discernible
  pattern.  I am able to reproduce the issue on one particular machine
  by scheduling shutdowns 3 times per day and waiting up to a few days
  for the problem to occur. Shutting down and starting up more
  frequently, like every 5 minutes or even an hour, will not trigger the
  problem, it seems like the machine needs to be running for a while.
  It does not seem to depend on any user actions, it happens even if you
  never login.  It has happened on reboots as as opposed to shutdowns as
  well.   I found a few similar bug reports but nothing for these exact
  symptoms.

  I have tried blacklisting mei_me with no change in behavior.  I'm not
  sure but the majority of the affected machines are using intel video
  chips.  Next I am going to try a mainline 4.10 kernel.

  
  lsb_release -rd
  Description:  Ubuntu 16.04.3 LTS
  Release:      16.04

  
  apt-cache policy linux-image-4.10.0-33-generic
  linux-image-4.10.0-33-generic:
    Installed: 4.10.0-33.37~16.04.1
    Candidate: 4.10.0-33.37~16.04.1
    Version table:
   *** 4.10.0-33.37~16.04.1 500
          500 http://us.archive.ubuntu.com/ubuntu xenial-security/main amd64 
Packages
          100 /var/lib/dpkg/status

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.10.0-33-generic 4.10.0-33.37~16.04.1
  ProcVersionSignature: Ubuntu 4.10.0-33.37~16.04.1-generic 4.10.17
  Uname: Linux 4.10.0-33-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: amd64
  CurrentDesktop: XFCE
  Date: Tue Aug 29 08:57:26 2017
  SourcePackage: linux-hwe
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713751/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to