Hi, We're encountering a systemd hang on reboot which is proving hard to debug, on the OLPC XO platform (systemd-44 on Fedora 17). It doesn't happen every time, but it is frequent: when running a system that reboots once every 2-3 minutes, it reproduces with an hour (usually much quicker). Can anyone suggest debugging techniques for the following situation, or are there similar-sounding bug reports already that might provide clues?
- /sbin/reboot is run, and exits with code 0, without producing any output on stderr or stdout. - the reboot process is definitely initiated, because plymouth's shutdown screen comes up, and the serial console getty is stopped - the hang happens with the plymouth shutdown splash on-screen, and the system continues responding to keypresses (showing/hiding the plymouth splash) - disabling the plymouth shutdown splash doesn't solve the hang, and no interesting messages appear on the console either - the system no longer responds to sysrq over serial (even when the kernel sysrq_always_enabled parameter is used) - the shutdown scripts in /usr/lib/systemd/system-shutdown are not called - enabling systemd debugging via kernel parameters "systemd.log_level=debug systemd.log_target=kmsg" causes the hang not to happen (left a system reboot-looping with this configuration for 24 hours without hitting the issue) Any tips appreciated. This is perhaps unlikely to be a systemd issue, because when we reboot from a "normal" session, we don't hit this issue (but I think systemd could help us find the problem?). We hit this issue when rebooting after running our manufacturing tests, which aim to hammer the system very hard and activate as many components as possible (microphone, camera, screen, disk, RAM check, ...). These tests are activated as follows: 1. During boot, runin-check.service (runs early) notes that the laptop's manufacturing data says that the system should run manufacturing tests rather than starting a real session. The runin-check program then calls "systemctl isolate runin.target" 2. runin.target starts the runin-main program which opens an X session and kicks off all kinds of tests Here are the debug logs from a successful boot-to-reboot cycle (when things work OK): http://dev.laptop.org/~dsd/20120704/runin-verbose.txt At 15.991475, runin-check runs "systemctl isolate runin.target" At 18.969571, runin tests start At 30.505676, runin tests fail and the reboot process is initiated. (I deliberately triggered the fail so that I don't have to wait a long time for the reboot to happen) At 36.818280, "/sbin/reboot" is called by runin At 46.956082 the scripts in /usr/lib/systemd/system-shutdown are called Here are the relevant service/target files: runin-check.service: [Unit] Description=Check whether to run OLPC run-in tests DefaultDependencies=no Requires=olpc-configure.service After=olpc-configure.service Before=basic.target [Service] Type=oneshot ExecStart=/runin/runin-check [Install] WantedBy=basic.target runin.target: [Unit] Description=OLPC run-in tests AllowIsolate=true DefaultDependencies=no Requires=runin.service After=olpc-configure.service Wants=plymouth-quit.service plymouth-quit-wait.service runin.service: [Unit] Description=OLPC run-in tests DefaultDependencies=no Wants=udev-settle.service After=udev-settle.service plymouth-quit.service plymouth-quit-wait.service [Service] ExecStart=/runin/runin-main Any help appreciated; this is currently the last blocking bug we have preventing our latest software image (our first systemd-based release!) from entering mass-production in the factory. Thanks! Daniel _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
