On 2022-02-04 18:01:34 +0100, Vincent Lefevre wrote:
> So it seems that there is some randomness in the reproducibility,
> but with the patched "at" and many others tests to try to make
> the new job appear at the end (which I couldn't succeed, but this
> seems to be useless anyway), I couldn't reproduce the bug at all
> (but why???).

After adding some logging to the unpatched "at", i.e. with the
remaining "now = time(NULL);":

Feb 04 18:42:36 zira atd[72429]: now = 1643996556
Feb 04 18:42:36 zira atd[72429]: run_time = 1643996580
Feb 04 18:42:36 zira atd[72429]: run_time = 1644876000
Feb 04 18:42:36 zira atd[72429]: run_time = 1644912000
Feb 04 18:42:36 zira atd[72429]: next_invocation = 1643996580
Feb 04 18:43:00 zira atd[72429]: now = 1643996579
Feb 04 18:43:00 zira atd[72429]: run_time = 1643996580
Feb 04 18:43:00 zira atd[72429]: run_time = 1644876000
Feb 04 18:43:00 zira atd[72429]: run_time = 1644912000
Feb 04 18:43:00 zira atd[72429]: next_invocation = 1643996580

There are 2 causes:

1. time(NULL) and clock_gettime(CLOCK_REALTIME,...) may have a
   one-second shift. So, at 18:43:00, "now" was still seen at
   18:42:59; thus the job that should have run at 18:43:00 was
   still seen as being in the future. Not a real issue, except
   inefficiency, thanks to the loop, which should take care of
   that.

2. Due to the above minor issue, the timer was set at the present
   time 18:43:00 (1643996580). But for some reason, the pause()
   that follows it is not interrupted. This is the real issue.
   This also makes the patched "at" unreliable as this yields a
   race condition: the timer may still expire before pause() is
   called, though this is normally rather unlikely, I think.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply via email to