[bouncing this message to the BTS for posterity]

Adam D. Barratt via RT writes ("[rt.debian.org #9931] /var/run/reboot-lock 
ineffective on tag2upload-builder-01"):
> The script in question is 
> https://salsa.debian.org/dsa-team/mirror/dsa-puppet/-/blob/production/modules/debian_org/files/molly-guard/15-acquire-reboot-lock?ref_type=heads
> 
> It does seem to be designed to hold the lock so long as systemd indicates 
> that a reboot is scheduled.

I'm no expert on systemd, but, AFAICT, what is supposed to happen here
is:

 1. System administrator (or some process) invokes reboot
    (or one of the other similar commands).

 2. reboot is diverted by molly-guard.  So we run molly-guard instead.

 3. molly-guard runs its hook scripts including 15-acquire-reboot-lock.

 4. 15-acquire-reboot-lock take the reboot lock.

 5. 15-acquire-reboot-lock spawns a daemonic child which
    (a) has a dup of the open-file which owns the lock
    (b) polls until /run/systemd/shutdown/scheduled doesn't exist.

 6. molly-guard carries on with the rest of the hook scripts
    and eventually runs systemd's reboot command.

 7. systemd's reboot command, and systemd itself
    (i) possibly do some prep work
    (ii) create /run/systemd/shutdown/scheduled
    (iii) shut down services, send SIGTERM, etc.
    (iv) reboot the host.

But!

What if 5(b)'s first poll happens before 7(ii) ?

I think nothing prevents this.  And indeed it's probably the usual
case.  This theory predicts that the reboot lock will *usually* not be
held while a reboot is in progress - but that state would not usually
be observable because it would usually be brief.  Unless, say, the
shutdown took a noticeable length of time.  Which it might well do if
we have tag2upload-oracled in the middle of its critical section!

I'm not sure exactly how to fix this.

I don't think this can be made completely reliable without features
that I think molly-guard and systemd don't have.

A naive approach would be to just add a sleep at the start of the
daemonic child.  That sleep would have to be long enough that any
other molly-guard scripts would finish (and we don't know what those
might be, but we could probably guess).

A more sophisticated approach might be to notice the process group of
the molly-guard hook script.  That is presumably the process group of
molly-guard itself, and of whatever is trying to shut down.  Then have
the daemonic child see if that process group's process group leader
still exists, and not exit in that case.  That seems liable to false
positives (meaning the reboot lock would be held for too long) and
also trouble if the process hierarchy isn't what we imagaine.

It might be possible to do something terrible with systemd's logfiles
^W journal.

Or a combination of these approaches.

For now I suggest adding "sleep 30" at line 15.  That means a
cancelled reboot couldn't be restarted within 30 seconds.  But it will
give a reboot that has been requested 30s to become properly scheduled
and visible in /run/systemd/shutdown/scheduled.

HTH.

Regards,
Ian.

-- 
Ian Jackson <[email protected]>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Reply via email to