On úterý 24. října 2017 17:10:39 CEST, Jan Kundrát wrote:
Hi,
is it possible to change systemd's global settings for RuntimeWatchdogSec at runtime? I would like to have the early boot "guarded" by the HW watchdog started by my platform code, and for systemd to take over only after a certain target has been reached. I was thinking about an extra unit which simply writes an appropriate config file, but the docs for `systemctl daemon-reload` or `daemon-reexec` do not talk about these top-level settins. How do I tell systemd to notice a new value?

Context: I'm using systemd on an embedded ARM box with reliable network connectivity. The system has two fully separate rootfs/kernel/devicetree instances, A and B. The bootloader starts a HW watchdog timer, and the bootloader keeps a counter tracking of how many times a particular A/B "boot slot" attempted to boot. The kernel ignores the watchdog, and once systemd gets launched and checks it system.conf file, it proceeds to re-start the WD timer periodically. Finally, a unit which is pulled in by my default target updates the bootloader's environment, resetting the boot counter.

My goal is to be able to boot a possibly broken image (but not a malicious one, of course) without fearing that it's going to lock me out of my device. If the new image "fails" for some reason, I epxect the HW watchdog to reset the system, the boot attempt counter to eventually reach zero, and the whole system to roll-back to the previous image, eventually. In my scneario, it's preferred to make the decision to reboot rather than waiting for human interaction for solving the actual problem. The once-failed slot can be re-flahed very cheapily, and an updated version can be re-tried during the next update attempt.

During my testing, I was able to unplug the system's SD card at a "wrong" moment which resulted in systemd trying to boot into emergency.target and ultimately failing due to a missing rootfs. I ended up with an unusable system which did not reboot automatically because systemd was periodically pinging the HW watchdog timer. [1]

I got a suggestion to adjust the important units so that they specify a FailureAction. I do not like that solution because it is additional work (identifying which units might fail, coming up with various possible failing scenarios, being hard to test and get "right" in face of systemd updates in future, etc). It also feels like I am attacking a wrong problem. I already *have* a watchdog which will shoot the system into the head if something wrong happens. Wouldn't it make more sense to rely on this piece of infrastructure and start telling the watchdog "hey, I'm OK" only after the system has fuly booted and my ultimate target has been *reached*?

SUggestions which offer additional possibilities are welcome. I like system'd feature set, and I won't pretend that I know all of them :).

With kind regards,
Jan

[1] https://github.com/systemd/systemd/issues/7063

I more or less solved this by *not* configuring systemd to start pinging the watchdog on its own. Then I added another unit depending on and being wanted by multi-user.target which checks whether everything is OK so far:

 [Unit]
 Description=Pinging the HW watchdog
 Requires=multi-user.target
 After=multi-user.target
[Service]
 Type=oneshot
ExecStartPre=/bin/sh -c '[ "$(/bin/systemctl list-units --failed --all --no-legend --no-pager)" == "" ]' ExecStart=/bin/busctl set-property org.freedesktop.systemd1 /org/freedesktop/systemd1 org.freedesktop.systemd1.Manager RuntimeWatchdogUSec t 30000000

For more details, see the original bugreport at https://github.com/systemd/systemd/issues/7063 .

Cheers,
Jan
_______________________________________________
systemd-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to