On Wed, Sep 24, 2025 at 08:49:41AM +0200, Samuel Thibault wrote:
> Michael Banck via Bug reports for the GNU Hurd, le mer. 24 sept. 2025
> 08:41:51 +0200, a ecrit:
> > On Sun, Sep 21, 2025 at 09:14:04AM +0000, Damien Zammit wrote:
> > > Between reading mtime and reading hpclock_read_counter,
> > > there may be an interrupt that updates mtime, therefore
> > > we need a check to perform the clock read process again
> > > in this case.
> > >
> > > TESTED: on UP using:
> >
> > There is a PostgreSQL isolation test that seems to be triggered by
> > the clock, while not moving backwards, not moving forward, i.e.
> > reporting the same timestamp twice in a row on subsequent
> > clock_gettime() calls.
> >
> > If I run the test case from [1] like this, I get, even with this patch
> > applied after a few thousand iterations:
> >
> > |$ for i in {1..10000}; do printf "ITERATION $i "; ./tt 100 || break; done
> > [...]
> > | ITERATION 3029 t1: -2073074700, t2: -2073069580, t2 - t1: 5120 (r: 4950)
> > | ITERATION 3030 t1: -2070257921, t2: -2070257921, t2 - t1: 0 (r: 4950)
>
> Yes, that can still happen with the current implementation: we really
> advance the time on clock tick, and use the hpet only to interpolate
> the time between the ticks. If for whatever reason the clock and the
> hpet are not perfectly synchronized, we clamp the advance so that the
> reported time stays monotonic. So two consecutive calls may report the
> same value. Trying to make sure that time always progress at least by
> 1ns (or 1µs if the application is using gettimeofday...) would be quite
> involved.
>
> I'd tend to say this is an issue in postgresql: it shouldn't assume that
> clocks have infinite precision.
I guess there is a spectrum here - certainly infinte precision is
unrealistic, but the question is what kind of minimum timer precision
applications can require (I've asked what the current requirements from
Postgres are).
Postgres has a pg_test_timing utility, the version in master/HEAD has
been enhanced to use ns resolution, and there I get on my qemu/kvm VM:
$ LANG=C ./pg_test_timing
Testing timing overhead for 3 seconds.
Average loop time including overhead: 13866,64 ns
Histogram of timing durations:
<= ns % of total running % count
0 0,0510 0,0510 122
1 0,0000 0,0510 0
3 0,0000 0,0510 0
7 0,0000 0,0510 0
15 0,0000 0,0510 0
31 0,0000 0,0510 0
63 0,0000 0,0510 0
127 0,0000 0,0510 0
255 0,0000 0,0510 0
511 0,0000 0,0510 0
1023 0,0004 0,0514 1
2047 0,0000 0,0514 0
4095 98,9320 98,9834 236681
8191 0,8845 99,8679 2116
16383 0,0393 99,9072 94
32767 0,0343 99,9415 82
[...]
536870911 0,0004 99,9996 1
1073741823 0,0004 100,0000 1
Observed timing durations up to 99,9900%:
ns % of total running % count
0 0,0510 0,0510 122
729 0,0004 0,0514 1
3519 0,0004 0,0518 1
3630 0,0130 0,0648 31
3640 0,1651 0,2299 395
3650 0,7449 0,9748 1782
3660 2,3395 3,3143 5597
[...]
9980 0,0004 99,8892 1
9990 0,0004 99,8896 1
...
782724560 0,0004 100,0000 1
Where as on my Linux Thinkpad host, I see this:
$ LANG=C ./pg_test_timing
Testing timing overhead for 3 seconds.
Average loop time including overhead: 13.84 ns
Histogram of timing durations:
<= ns % of total running % count
0 0.0000 0.0000 0
1 0.0000 0.0000 0
3 0.0000 0.0000 0
7 0.0000 0.0000 0
15 97.3170 97.3170 210936922
31 2.6288 99.9458 5697932
63 0.0505 99.9963 109441
127 0.0008 99.9971 1782
255 0.0022 99.9992 4674
511 0.0004 99.9996 789
1023 0.0002 99.9998 448
2047 0.0001 99.9999 260
4095 0.0000 99.9999 29
8191 0.0000 100.0000 28
16383 0.0000 100.0000 9
32767 0.0000 100.0000 20
65535 0.0000 100.0000 71
131071 0.0000 100.0000 1
Observed timing durations up to 99.9900%:
ns % of total running % count
12 0.6013 0.6013 1303308
13 30.4999 31.1012 66109219
14 61.5536 92.6548 133418976
15 4.6622 97.3170 10105419
16 1.5844 98.9014 3434318
17 0.6401 99.5415 1387333
18 0.1352 99.6766 292948
19 0.1831 99.8597 396853
20 0.0499 99.9097 108239
21 0.0094 99.9191 20358
22 0.0056 99.9247 12191
23 0.0065 99.9312 14037
24 0.0072 99.9384 15645
25 0.0023 99.9407 4988
26 0.0008 99.9415 1819
27 0.0004 99.9419 863
28 0.0008 99.9427 1706
29 0.0007 99.9434 1486
30 0.0006 99.9440 1296
31 0.0018 99.9458 3852
32 0.0018 99.9476 3884
33 0.0003 99.9479 675
34 0.0001 99.9480 180
35 0.0001 99.9480 120
36 0.0000 99.9480 42
37 0.0000 99.9480 38
38 0.0000 99.9481 38
39 0.0000 99.9481 30
40 0.0009 99.9489 1885
41 0.0046 99.9536 10039
42 0.0137 99.9673 29692
43 0.0157 99.9830 34041
44 0.0089 99.9918 19219
...
95775 0.0000 100.0000 1
(I should get a Linux VM running on qemu/kvm and compare timings there)
Those 0ns on qemu are the problem for the (probably artificial) stats
isolation test, but the Postgres hackers are also very unhappy about
proposing random sleep delays in their testsuite.
Michael