Here at Mozilla, we have 200 servers running on HP Moonshot system, all
have same hardware configuration and Ubuntu 16.04.2. The OS is not up to
date, we use it as is was released. We using a program to tests Firefox
source code and after each test we reboot the servers using
/sbin/reboot. After a while (between 24-48h - during this period ~6
reboots/h are made), randomly, all 200 servers get stuck at the reboot -
see the ILO capture - and to bring it back we have to power cycle each
of them.

On one of the beta servers, we have made the bellow updates/changes, set debug, 
set cron to reboot server after 5-10 min, however, the reboot freeze is still 
present:
- upgraded OS to Ubuntu 16.04.5 latest packages;
- used GRUB_CMDLINE_LINUX_DEFAULT="reboot=bios" 
- used GRUB_CMDLINE_LINUX_DEFAULT="acpi=off"
- GRUB_CMDLINE_LINUX_DEFAULT="reboot=force"
- upgraded Kernel to v4.15 (the main one from Ubuntu's repo);
- upgraded Kernel to v4.20 from https://kernel.ubuntu.com/~kernel-ppa/mainline/
- now we are testing the reboot with 4.20.3 from the above repo and working to 
update systemd.

Attached you can find the debug-log for:
- kernel 4.4.0-66-generic #87-Ubuntu - shutdown-debuglogkernel-4.4.txt
- kernel 4.15 - shutdown-log-kernel4-15.txt 
- kernel 4.20 shutdown-log-kernel420.txt
- ILO capture with the freeze ILO-reboot-freeze.PNG

Please check all this logs/capture and let us know a solution. Thanks.

** Attachment added: "UbuntuBug.zip"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1783499/+attachment/5230309/+files/UbuntuBug.zip

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to dbus in Ubuntu.
https://bugs.launchpad.net/bugs/1783499

Title:
  systemd: Failed to send signal

Status in dbus package in Ubuntu:
  Confirmed
Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  systemd: Failed to send signal.

  [    3.137257] systemd[1]: Failed to send job remove signal for 109: 
Connection reset by peer
  [    3.138119] systemd[1]: run-rpc_pipefs.mount: Failed to send unit change 
signal for run-rpc_pipefs.mount: Transport endpoint is not connected
  [    3.138185] systemd[1]: dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Failed 
to send unit change signal for dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: 
Transport endpoint is not connected
  [    3.138512] systemd[1]: run-rpc_pipefs.mount: Failed to send unit change 
signal for run-rpc_pipefs.mount: Transport endpoint is not connected
  [    3.142719] systemd[1]: Failed to send job remove signal for 134: 
Transport endpoint is not connected
  [    3.142958] systemd[1]: auth-rpcgss-module.service: Failed to send unit 
change signal for auth-rpcgss-module.service: Transport endpoint is not 
connected
  [    3.165359] systemd[1]: Failed to send job remove signal for 133: 
Transport endpoint is not connected
  [    3.165505] systemd[1]: proc-fs-nfsd.mount: Failed to send unit change 
signal for proc-fs-nfsd.mount: Transport endpoint is not connected
  [    3.165541] systemd[1]: dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: Failed 
to send unit change signal for dev-mapper-ubuntu\x2d\x2dvg\x2droot.device: 
Transport endpoint is not connected
  [    3.166854] systemd[1]: Failed to send job remove signal for 66: Transport 
endpoint is not connected
  [    3.167072] systemd[1]: proc-fs-nfsd.mount: Failed to send unit change 
signal for proc-fs-nfsd.mount: Transport endpoint is not connected
  [    3.167130] systemd[1]: systemd-modules-load.service: Failed to send unit 
change signal for systemd-modules-load.service: Transport endpoint is not 
connected
  [    2.929018] systemd[1]: Failed to send job remove signal for 53: Transport 
endpoint is not connected
  [    2.929220] systemd[1]: systemd-random-seed.service: Failed to send unit 
change signal for systemd-random-seed.service: Transport endpoint is not 
connected
  [    3.024320] systemd[1]: sys-devices-platform-serial8250-tty-ttyS12.device: 
Failed to send unit change signal for 
sys-devices-platform-serial8250-tty-ttyS12.device: Transport endpoint is not 
connected
  [    3.024421] systemd[1]: dev-ttyS12.device: Failed to send unit change 
signal for dev-ttyS12.device: Transport endpoint is not connected
  [    3.547019] systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to send 
unit change signal for proc-sys-fs-binfmt_misc.automount: Connection reset by 
peer
  [    3.547144] systemd[1]: Failed to send job change signal for 207: 
Transport endpoint is not connected

  
  How to reproduce:
  1. enable debug level journal
  LogLevel=debug in /etc/systemd/system.conf
  2. reboot the system
  3. journalctl | grep "Failed to send"

  
  sliu@vmlxhi-094:~$ lsb_release -rd
  Description:  Ubuntu 16.04.4 LTS
  Release:      16.04

  sliu@vmlxhi-094:~$ systemctl --version
  systemd 229
  +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP 
+GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

  sliu@vmlxhi-094:~$ dbus-daemon --version
  D-Bus Message Bus Daemon 1.10.6
  Copyright (C) 2002, 2003 Red Hat, Inc., CodeFactory AB, and others
  This is free software; see the source for copying conditions.
  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A 
PARTICULAR PURPOSE.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/dbus/+bug/1783499/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to