2014/1/20 Colin Guthrie <[email protected]>: > CC'ing Zbigniew as he's working on the Fedora bug AFAIK. > > 'Twas brillig, and Lennart Poettering at 20/01/14 12:37 did gyre and gimble: >> On Thu, 16.01.14 12:28, Colin Guthrie ([email protected]) wrote: >> >>> >>> 'Twas brillig, and Colin Guthrie at 14/01/14 13:28 did gyre and gimble: >>>> 3. Some sort of kernel trigger for me today led it to run two reexecs >>>> quite quickly and triggered this problem randomly during runtime. This >>>> *might* have come in via "telinit u" instead. It doesn't appear that the >>>> kernel actually execs telinit directly but perhaps userspace can react >>>> on it in some way? >>> >>> OK, this, it turns out is a result of running prelink via cron. >>> >>> The prelink package we (Mageia) have is basically the same as the Fedora >>> one. It has a cronjob which calls "telinit u" but the prelink binary >>> itself calls "/sbin/init U" which does the same thing, thus two >>> daemon-reexecs in rapid succession which triggers this bug. >>> >>> For now I've disabled the "telinit u" call in prelink, but the real >>> trick would be fixing the bug/race in serialisation :) >> >> Hmm, so, normally PID 1 should not accept new requests after the >> deserialization of the first reexec is complete. >> >> Let me sumarize this a bit: >> >> Is this about reexec or reload? Or both? > > I was confused at first, but it seems "both" in the end. See here for a > reproduction case involving either (tho' reload requires a --no-block > param to trigger): > > https://bugzilla.redhat.com/show_bug.cgi?id=1043212#c20 > >> This is supposed to trigger the issue? "systemctl daemon-reexec ; >> systemctl daemon-reexec"? What precisely goes bad afterwards? Does this >> always trigger the issue or only sometimes? > > On my system it's pretty reliable and will trigger it every time. It > might need a setup where loading the serialised state triggers a few > jobs to make it take longer. e.g. on my setup the Type=oneshot units > were all rerun when reloading the state (which actually seems wrong to > me - e.g. my alsa-restore.service job kicked in again which made an > in-progress VoIP call weird by suddenly changing my Headphones port back > to Speakers!! - I've since started using the alsa-state daemon instead > which mitigates things, but re-running oneshot's seems wrong no?) > >> What version are you using? Can you reproduce the issue on git? > > It's almost identical to the fedora 20 version - 208 + lots of patches. > > Not tried latest git yet I'm afraid, but as it also apparently affects > fedora 20 (see above bug) I'm guessing you'll need something backported > anyway and I'm not sure if there is any specific fix (although the sdbus > port might have fixed it indirectly if it doesn't occur any more)
fwiw: I just tested with a (quiet recent) git version and I can't reproduce it. note that this is on archlinux, without any sysv compat stuff. _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
