Public bug reported:

Discovered on bionic, arm64 (Moonshot, verified on multiple swirlix
cartridges), 4.15.0-22-generic.

After deploying the nova-compute Juju charm, on subsequent reboots,
within a few seconds after complete boot, everything will freeze and
eventually display on the serial console (just these, no traces):

[  188.010510] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [juju-log:2272]
[  216.010292] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [juju-log:2272]

(From here on, "lock up" refers to that sequence: boot a kernel, it
completes boot to login prompt, then everything freezes a few seconds
later, then BUGs.)

It's usually but not always juju-log, sometimes a relation-ids or
similar.  I was able to briefly notice that it was in its startup
config-changed hook.

I've separated out and tested nearly everything it does during its
startup config-changed (sets up bridging, writes some config files,
restarts libvirtd/nova-compute/etc) without being able to trigger the
bug, but I suspect proximity to boot is a factor.  If I disable jujud-
unit-nova-compute startup, boot, log in, re-enable and start (by which
time over a minute or so has elapsed from boot finish), it will not lock
up.  Similarly, if I wrap the jujud startup in a `strace -Ff -o
/var/log/strace.log` (which slows it down massively), it will not lock
up.  Watched pot syndrome.

I've tried kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ .
I noticed most of the recent arm64 mainline kernels had failed builds,
notified the kernel team channel and apw fixed the issue and started
some rebuilds.

What I've discovered (after many dead ends and a futile bisection) is
that mainline builds before the rebuilds lock up, but fixed mainline
builds initiated by apw DO NOT lock up.  e.g. 4.16.3-041603.201804190730
locks up, but 4.16.6-041606.201806042022 does not lock up.  (4.16.4 and
4.16.5 appear to have never been rebuilt and don't have arm64 debs, and
that period is what I tried to bisect after figuring a fix must be in
there.)

But when I try to compile any of these recent kernels myself, they lock
up when booted.  Same kernel configs, tried on both bionic and in a
cosmic chroot, tried both native arm64 compile and cross-compile from
amd64. e.g. 4.16.6-041606.201806042022 from k.u.c does not lock up, but
when I build it myself, it does.

TBC, I've verified lock ups on the following kernels (all assume kernel
configs from their respective Ubuntu or k.u.c mainline builds):

- 4.15.0-22-generic from bionic (both Ubuntu-provided and my own recompile)
- v4.16 (and all point releases)
- v4.17

As I write this, my compiled v4.10 DOES NOT appear to lock up.  I will
attempt to bisect at a macro level from 4.10..4.15 and dig deeper.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-22-generic 4.15.0-22.24
ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
Uname: Linux 4.15.0-22-generic aarch64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 Jun  2 04:22 seq
 crw-rw---- 1 root audio 116, 33 Jun  2 04:22 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: arm64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
Date: Fri Jun  8 00:13:05 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
PciMultimedia:
 
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:
 
ProcKernelCmdLine: console=ttyS0,9600n8r ro
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-22-generic N/A
 linux-backports-modules-4.15.0-22-generic  N/A
 linux-firmware                             1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: apport-bug arm64 bionic uec-images

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1775732

Title:
  arm64 soft lock crashes on nova-compute charm running

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Discovered on bionic, arm64 (Moonshot, verified on multiple swirlix
  cartridges), 4.15.0-22-generic.

  After deploying the nova-compute Juju charm, on subsequent reboots,
  within a few seconds after complete boot, everything will freeze and
  eventually display on the serial console (just these, no traces):

  [  188.010510] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 
[juju-log:2272]
  [  216.010292] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! 
[juju-log:2272]

  (From here on, "lock up" refers to that sequence: boot a kernel, it
  completes boot to login prompt, then everything freezes a few seconds
  later, then BUGs.)

  It's usually but not always juju-log, sometimes a relation-ids or
  similar.  I was able to briefly notice that it was in its startup
  config-changed hook.

  I've separated out and tested nearly everything it does during its
  startup config-changed (sets up bridging, writes some config files,
  restarts libvirtd/nova-compute/etc) without being able to trigger the
  bug, but I suspect proximity to boot is a factor.  If I disable jujud-
  unit-nova-compute startup, boot, log in, re-enable and start (by which
  time over a minute or so has elapsed from boot finish), it will not
  lock up.  Similarly, if I wrap the jujud startup in a `strace -Ff -o
  /var/log/strace.log` (which slows it down massively), it will not lock
  up.  Watched pot syndrome.

  I've tried kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/
  .  I noticed most of the recent arm64 mainline kernels had failed
  builds, notified the kernel team channel and apw fixed the issue and
  started some rebuilds.

  What I've discovered (after many dead ends and a futile bisection) is
  that mainline builds before the rebuilds lock up, but fixed mainline
  builds initiated by apw DO NOT lock up.  e.g.
  4.16.3-041603.201804190730 locks up, but 4.16.6-041606.201806042022
  does not lock up.  (4.16.4 and 4.16.5 appear to have never been
  rebuilt and don't have arm64 debs, and that period is what I tried to
  bisect after figuring a fix must be in there.)

  But when I try to compile any of these recent kernels myself, they
  lock up when booted.  Same kernel configs, tried on both bionic and in
  a cosmic chroot, tried both native arm64 compile and cross-compile
  from amd64. e.g. 4.16.6-041606.201806042022 from k.u.c does not lock
  up, but when I build it myself, it does.

  TBC, I've verified lock ups on the following kernels (all assume
  kernel configs from their respective Ubuntu or k.u.c mainline builds):

  - 4.15.0-22-generic from bionic (both Ubuntu-provided and my own recompile)
  - v4.16 (and all point releases)
  - v4.17

  As I write this, my compiled v4.10 DOES NOT appear to lock up.  I will
  attempt to bisect at a macro level from 4.10..4.15 and dig deeper.

  ProblemType: Bug
  DistroRelease: Ubuntu 18.04
  Package: linux-image-4.15.0-22-generic 4.15.0-22.24
  ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
  Uname: Linux 4.15.0-22-generic aarch64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jun  2 04:22 seq
   crw-rw---- 1 root audio 116, 33 Jun  2 04:22 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.2
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Fri Jun  8 00:13:05 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: console=ttyS0,9600n8r ro
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-22-generic N/A
   linux-backports-modules-4.15.0-22-generic  N/A
   linux-firmware                             1.173.1
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1775732/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to