> From: David Hill <dh...@mindcry.org>
> Date: Thu, 6 Jun 2019 09:55:00 -0400
> 
> On 6/6/19 8:49 AM, Mark Kettenis wrote:
> >> From: David Hill <dh...@mindcry.org>
> >> Date: Thu, 6 Jun 2019 08:22:06 -0400
> >>
> >> On 6/6/19 6:56 AM, Scott Cheloha wrote:
> >>> On Wed, Jun 05, 2019 at 08:33:56PM -0400, David Hill wrote:
> >>>> Hello -
> >>>>
> >>>> I noticed some negative roundtrip times when running traceroute, which 
> >>>> uses
> >>>> the monotonic clock to calculate the RTT.
> >>>>
> >>>> If I run the following code, it eventually bombs.?? It bombs quick if I
> >>>> launch Firefox.
> >>>>
> >>>> timespeccmp failed
> >>>> tp1 s:103780 n:63101042
> >>>> tp2 s:103779 n:761117849
> >>> please share your dmesg and the output of
> >>>
> >>>   sysctl kern.clockrate kern.timecounter
> >>>
> >>> Could you also share the output when you run the following code?
> >>> With and without firefox running?  Curious about how frequently
> >>> this happens.
> >>>
> >>> #include <sys/time.h>
> >>>
> >>> #include <err.h>
> >>> #include <stdio.h>
> >>> #include <time.h>
> >>>
> >>> int
> >>> main(void)
> >>> {
> >>>   struct timespec t0, t1, timeout;
> >>>
> >>>   timeout.tv_sec = 0;
> >>>   timeout.tv_nsec = 100000;
> >>>
> >>>   for (;;) {
> >>>           clock_gettime(CLOCK_MONOTONIC, &t0);
> >>>           if (nanosleep(&timeout, NULL) == -1)
> >>>                   err(1, "nanosleep");
> >>>           clock_gettime(CLOCK_MONOTONIC, &t1);
> >>>           if (timespeccmp(&t0, &t1, >)) {
> >>>                   printf("%lld.%09ld precedes %lld.%09ld\n",
> >>>                       t1.tv_sec, t1.tv_nsec, t0.tv_sec, t0.tv_nsec);
> >>>           }
> >>>   }
> >>>
> >>>   return 0;
> >>> }
> >>>
> >> $ sysctl kern.clockrate kern.timecounter
> >> kern.clockrate=tick = 10000, tickadj = 40, hz = 100, profhz = 100,
> >> stathz = 100
> >> kern.timecounter.tick=1
> >> kern.timecounter.timestepwarnings=0
> >> kern.timecounter.hardware=tsc
> >> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> >>
> >> 146259.148543164 precedes 146259.450584680
> >> 146259.488592813 precedes 146259.790556528
> >> 146259.808566236 precedes 146260.110528202
> >> 146261.148476141 precedes 146261.450398822
> >> 146262.888279505 precedes 146263.190240715
> >> 146263.887431808 precedes 146264.189435801
> >> 146274.487159451 precedes 146274.789148187
> >> 146278.186820816 precedes 146278.488803756
> >> 146279.546680181 precedes 146279.848674138
> >> 146284.326305557 precedes 146284.628224696
> >> 146285.426154861 precedes 146285.728066901
> >> 146287.885921929 precedes 146288.187826688
> >> 146289.985763711 precedes 146290.287731233
> >> 146290.985659041 precedes 146291.287640302
> >> 146293.985348615 precedes 146294.287327519
> >> 146295.465186742 precedes 146295.767218516
> >> 146295.885156426 precedes 146296.187140717
> >> 146298.544907956 precedes 146298.846882759
> >> 146298.644894308 precedes 146298.946889142
> > Does this also happen with
> >
> > kern.timecounter.hardware=acpihpet
> >
> > ?
> 
> Ran the code with kern.timecounter.hardware=acpihpet0 for 10 minutes 
> without any output.
> 
> With the code still running, I switched to tsc and starting seeing output:
> 
> 151944.345840447 precedes 151944.647753895
> 151944.525774690 precedes 151944.827681380
> 151946.405553296 precedes 151946.707533012
> 
> The output stops as soon as I switch back to acpihpet0.
> 
> I will add that to my /etc/sysctl.conf.
> 
> Thank you.

So the TSCs on the cores of your brand new Intel Kaby Lake R processor
are out of sync.  That is the final drop.  I disabled the code that
promotes the TSC as a high-quality timecounter on MULTIPROCESSOR
kernels.  It simply doesn't work on a significant fraction of machines
and causes all kinds of weird problems.

Scott, any change to turn that bit of code into a regress test?

Reply via email to