I think this problem is down to the IPMI style of watchdog being
configured when the machine boots, but the Linux system not starting the
refresh action until much later. For example if I try a test on my home
PC (Ubuntu 12.04 64-bit desktop) after installing the watchdog and using
"touch /forcefsck" to simulate this, my syslog has this relevant part:

Jan 19 21:30:28 paul-ubuntu kernel: [    9.492544] [drm] Initialized radeon 
2.36.0 20080528 for 0000:01:05.0 on minor 0
Jan 19 21:30:28 paul-ubuntu kernel: [    9.617760] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [    9.917460] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   10.217249] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   10.516992] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   10.816578] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   11.116403] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   11.416101] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   11.715686] HDMI ATI/AMD: no speaker 
allocation for ELD
Jan 19 21:30:28 paul-ubuntu kernel: [   91.544151] EXT4-fs (md1): re-mounted. 
Opts: errors=remount-ro
Jan 19 21:30:28 paul-ubuntu kernel: [   91.896020] EXT4-fs (md2): mounted 
filesystem with ordered data mode. Opts: (null)
Jan 19 21:30:28 paul-ubuntu kernel: [   92.431631] EXT4-fs (md0): mounted 
filesystem with ordered data mode. Opts: (null)
Jan 19 21:30:28 paul-ubuntu kernel: [   96.927417] EXT4-fs (md3): mounted 
filesystem with ordered data mode. Opts: (null)
Jan 19 21:30:28 paul-ubuntu kernel: [   97.037128] RPC: Registered named UNIX 
socket transport module.
Jan 19 21:30:28 paul-ubuntu kernel: [   97.037132] RPC: Registered udp 
transport module.
Jan 19 21:30:28 paul-ubuntu kernel: [   97.037133] RPC: Registered tcp 
transport module.
Jan 19 21:30:28 paul-ubuntu kernel: [   97.037136] RPC: Registered tcp NFSv4.1 
backchannel transport module.
Jan 19 21:30:28 paul-ubuntu kernel: [   97.084401] FS-Cache: Loaded
Jan 19 21:30:28 paul-ubuntu kernel: [   97.274125] FS-Cache: Netfs 'nfs' 
registered for caching
Jan 19 21:30:28 paul-ubuntu kernel: [   97.316611] init: failsafe main process 
(1320) killed by TERM signal
Jan 19 21:30:28 paul-ubuntu kernel: [   97.382416] audit_printk_skb: 12 
callbacks suppressed
Jan 19 21:30:28 paul-ubuntu kernel: [   97.382419] type=1400 
audit(1453239028.288:16): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="/usr/sbin/rsyslogd" pid=1352 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.442406] Installing knfsd (copyright 
(C) 1996 o...@monad.swb.de).
Jan 19 21:30:28 paul-ubuntu kernel: [   97.945825] type=1400 
audit(1453239028.852:17): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/sbin/dhclient" pid=1415 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.945833] type=1400 
audit(1453239028.852:18): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" 
pid=1415 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.945837] type=1400 
audit(1453239028.852:19): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=1415 
comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.946190] type=1400 
audit(1453239028.852:20): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" 
pid=1415 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.946194] type=1400 
audit(1453239028.852:21): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=1415 
comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.946379] type=1400 
audit(1453239028.852:22): apparmor="STATUS" operation="profile_replace" 
profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=1415 
comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.984247] type=1400 
audit(1453239028.892:23): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="/sbin/klogd" pid=1416 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   97.984684] type=1400 
audit(1453239028.892:24): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="/bin/ping" pid=1413 comm="apparmor_parser"
Jan 19 21:30:28 paul-ubuntu kernel: [   98.000167] type=1400 
audit(1453239028.908:25): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="/sbin/syslog-ng" pid=1417 comm="apparmor_parser"
Jan 19 21:30:29 paul-ubuntu wd_keepalive[1502]: starting watchdog keepalive 
daemon (5.14):
Jan 19 21:30:29 paul-ubuntu wd_keepalive[1502]:  int=1 alive=/dev/watchdog 
realtime=yes
Jan 19 21:30:29 paul-ubuntu wd_keepalive[1502]: watchdog now set to 60 seconds
Jan 19 21:30:29 paul-ubuntu wd_keepalive[1502]: hardware watchdog identity: 
Software Watchdog
Jan 19 21:30:29 paul-ubuntu anacron[1516]: Anacron 2.3 started on 2016-01-19

The date/time in the first column is the time that rsyslog
received/wrote the file, what is important here is the high resolution
kernel timer that ran while the boot messages were being stacked up. You
can see my system took just under 80 seconds to run the fsck (from
11.715686 to  91.544151) and it was at least 7 seconds later (after
98.000167) before it started wd_keepalive which is the first daemon
entry in the /etc/rc1.d locations, etc.

So yes, somehow the daemon needs to start before fsck to prevent this
sort of loop. But how to deal with the daemon being on one of the file
systems to be checked (as well as using /run/lock and similar) is not an
easy task.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1093870

Title:
  A long startup fsck causes cyclic watchdog reboot

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/watchdog/+bug/1093870/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to