On Sun, Aug 14, 2022 at 11:24:37PM -0500, Scott Cheloha wrote: > > In the future when the LAPIC timer is run in oneshot mode there will > be no lapic_delay(). > > [...] > > This is *very* bad for older amd64 machines, because you are left with > i8254_delay(). > > I would like to offer a less awful delay(9) implementation for this > class of hardware. Otherwise we may trip over bizarre phantom bugs on > MP kernels because only one CPU can read the i8254 at a time. > > [...] > > Real i386 hardware should be fine. Later models with an ACPI PM timer > will be fine using acpitimer_delay() instead of i8254_delay(). > > [...] > > Here are the sample measurements from my 2017 laptop (kaby lake > refresh) running the attached patch. It takes longer than a > microsecond to read either of the ACPI timers. The PM timer is better > than the HPET. The HPET is a bit better than the i8254. I hope the > numbers are a little better on older hardware. > > acpitimer_test_delay: expected 0.000001000 actual 0.000010638 error > 0.000009638 > acpitimer_test_delay: expected 0.000010000 actual 0.000015464 error > 0.000005464 > acpitimer_test_delay: expected 0.000100000 actual 0.000107619 error > 0.000007619 > acpitimer_test_delay: expected 0.001000000 actual 0.001007275 error > 0.000007275 > acpitimer_test_delay: expected 0.010000000 actual 0.010007891 error > 0.000007891 > > acpihpet_test_delay: expected 0.000001000 actual 0.000022208 error > 0.000021208 > acpihpet_test_delay: expected 0.000010000 actual 0.000031690 error > 0.000021690 > acpihpet_test_delay: expected 0.000100000 actual 0.000112647 error > 0.000012647 > acpihpet_test_delay: expected 0.001000000 actual 0.001021480 error > 0.000021480 > acpihpet_test_delay: expected 0.010000000 actual 0.010013736 error > 0.000013736 > > i8254_test_delay: expected 0.000001000 actual 0.000040110 error > 0.000039110 > i8254_test_delay: expected 0.000010000 actual 0.000039471 error > 0.000029471 > i8254_test_delay: expected 0.000100000 actual 0.000128031 error > 0.000028031 > i8254_test_delay: expected 0.001000000 actual 0.001024586 error > 0.000024586 > i8254_test_delay: expected 0.010000000 actual 0.010021859 error > 0.000021859
Attched is an updated patch. I left the test measurement code in place because I would like to see a test on a real i386 machine, just to make sure it works as expected. I can't imagine why it wouldn't work, but we should never assume anything. Changes from v1: - Actually set delay_func from acpitimerattach() and acpihpet_attach(). I think it's safe to assume, on real hardware, that the ACPI PMT is preferable to the i8254 and the HPET is preferable to both of them. This is not *always* true, but it is true on the older machines that can't use tsc_delay(), so the assumption works in practice. Outside of those three timers, the hierarchy gets murky. There are other timers that are better than the HPET, but they aren't always available. If those timers are already providing delay_func this code does not usurp them. - Duplicate test measurement code from amd64/lapic.c into i386/lapic.c. Will be removed in the committed version. - Use bus_space_read_8() in acpihpet.c if it's available. The HPET is a 64-bit counter and the spec permits 32-bit or 64-bit aligned access. As one might predict, this cuts the overhead in half because we're doing half as many reads. This part can go into a separate commit, but I thought it was neat so I'm including it here. One remaining question I have: Is there a nice way to test whether ACPI PMT support is compiled into the kernel? We can assume the existence of i8254_delay() because clock.c is required on amd64 and i386. However, acpitimer.c is a optional, so acpitimer_delay() isn't necessarily there. I would rather not introduce a hard requirement on acpitimer.c into acpihpet.c if there's an easy way to check for the latter. Any ideas? Anyone have i386 hardware results? If I'm reading the timeline right, most P6 machines and beyond (NetBurst, etc) will have an ACPI PMT. I don't know if any real x86 motherboards shipped with an HPET, but it's possible. Here are my updated results with the bus_space_read_8 change: acpitimer_test_delay: expected 0.000001000 actual 0.000010607 error 0.000009607 acpitimer_test_delay: expected 0.000010000 actual 0.000015491 error 0.000005491 acpitimer_test_delay: expected 0.000100000 actual 0.000107734 error 0.000007734 acpitimer_test_delay: expected 0.001000000 actual 0.001008006 error 0.000008006 acpitimer_test_delay: expected 0.010000000 actual 0.010007042 error 0.000007042 acpihpet_test_delay: expected 0.000001000 actual 0.000013282 error 0.000012282 acpihpet_test_delay: expected 0.000010000 actual 0.000022743 error 0.000012743 acpihpet_test_delay: expected 0.000100000 actual 0.000109826 error 0.000009826 acpihpet_test_delay: expected 0.001000000 actual 0.001012149 error 0.000012149 acpihpet_test_delay: expected 0.010000000 actual 0.010010841 error 0.000010841 i8254_test_delay: expected 0.000001000 actual 0.000039767 error 0.000038767 i8254_test_delay: expected 0.000010000 actual 0.000039490 error 0.000029490 i8254_test_delay: expected 0.000100000 actual 0.000127800 error 0.000027800 i8254_test_delay: expected 0.001000000 actual 0.001023940 error 0.000023940 i8254_test_delay: expected 0.010000000 actual 0.010032127 error 0.000032127 And the patch: Index: arch/amd64/amd64/lapic.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v retrieving revision 1.60 diff -u -p -r1.60 lapic.c --- arch/amd64/amd64/lapic.c 15 Aug 2022 04:17:50 -0000 1.60 +++ arch/amd64/amd64/lapic.c 16 Aug 2022 16:09:56 -0000 @@ -466,6 +466,19 @@ lapic_initclocks(void) lapic_startclock(); i8254_inittimecounter_simple(); + + extern void acpitimer_test_delay(int); + extern void acpihpet_test_delay(int); + extern void i8254_test_delay(int); + int usec[] = { 1, 10, 100, 1000, 10000 }; + size_t i; + delay(20000); /* wait for real timecounter to activate */ + for (i = 0; i < nitems(usec); i++) + acpitimer_test_delay(usec[i]); + for (i = 0; i < nitems(usec); i++) + acpihpet_test_delay(usec[i]); + for (i = 0; i < nitems(usec); i++) + i8254_test_delay(usec[i]); } Index: arch/amd64/isa/clock.c =================================================================== RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v retrieving revision 1.36 diff -u -p -r1.36 clock.c --- arch/amd64/isa/clock.c 13 Feb 2022 19:15:09 -0000 1.36 +++ arch/amd64/isa/clock.c 16 Aug 2022 16:09:56 -0000 @@ -266,6 +266,22 @@ i8254_delay(int n) } void +i8254_test_delay(int usecs) +{ + struct timespec ac, er, ex, t0, t1; + + nanouptime(&t0); + i8254_delay(usecs); + nanouptime(&t1); + timespecsub(&t1, &t0, &ac); + NSEC_TO_TIMESPEC(usecs * 1000ULL, &ex); + timespecsub(&ac, &ex, &er); + printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n", + __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec, + er.tv_sec, er.tv_nsec); +} + +void rtcdrain(void *v) { struct timeout *to = (struct timeout *)v; Index: arch/i386/i386/lapic.c =================================================================== RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v retrieving revision 1.49 diff -u -p -r1.49 lapic.c --- arch/i386/i386/lapic.c 15 Aug 2022 04:17:50 -0000 1.49 +++ arch/i386/i386/lapic.c 16 Aug 2022 16:09:56 -0000 @@ -281,6 +281,19 @@ lapic_initclocks(void) lapic_startclock(); i8254_inittimecounter_simple(); + + extern void acpitimer_test_delay(int); + extern void acpihpet_test_delay(int); + extern void i8254_test_delay(int); + int usec[] = { 1, 10, 100, 1000, 10000 }; + size_t i; + delay(20000); /* wait for real timecounter to activate */ + for (i = 0; i < nitems(usec); i++) + acpitimer_test_delay(usec[i]); + for (i = 0; i < nitems(usec); i++) + acpihpet_test_delay(usec[i]); + for (i = 0; i < nitems(usec); i++) + i8254_test_delay(usec[i]); } extern int gettick(void); /* XXX put in header file */ Index: arch/i386/isa/clock.c =================================================================== RCS file: /cvs/src/sys/arch/i386/isa/clock.c,v retrieving revision 1.60 diff -u -p -r1.60 clock.c --- arch/i386/isa/clock.c 23 Feb 2021 04:44:30 -0000 1.60 +++ arch/i386/isa/clock.c 16 Aug 2022 16:09:56 -0000 @@ -375,6 +375,22 @@ i8254_delay(int n) } } +void +i8254_test_delay(int usecs) +{ + struct timespec ac, er, ex, t0, t1; + + nanouptime(&t0); + i8254_delay(usecs); + nanouptime(&t1); + timespecsub(&t1, &t0, &ac); + NSEC_TO_TIMESPEC(usecs * 1000ULL, &ex); + timespecsub(&ac, &ex, &er); + printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n", + __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec, + er.tv_sec, er.tv_nsec); +} + int calibrate_cyclecounter_ctr(void) { Index: dev/acpi/acpitimer.c =================================================================== RCS file: /cvs/src/sys/dev/acpi/acpitimer.c,v retrieving revision 1.15 diff -u -p -r1.15 acpitimer.c --- dev/acpi/acpitimer.c 6 Apr 2022 18:59:27 -0000 1.15 +++ dev/acpi/acpitimer.c 16 Aug 2022 16:09:56 -0000 @@ -18,6 +18,7 @@ #include <sys/param.h> #include <sys/systm.h> #include <sys/device.h> +#include <sys/stdint.h> #include <sys/timetc.h> #include <machine/bus.h> @@ -25,10 +26,13 @@ #include <dev/acpi/acpireg.h> #include <dev/acpi/acpivar.h> +struct acpitimer_softc; + int acpitimermatch(struct device *, void *, void *); void acpitimerattach(struct device *, struct device *, void *); - +void acpitimer_delay(int); u_int acpi_get_timecount(struct timecounter *tc); +uint32_t acpitimer_read(struct acpitimer_softc *); static struct timecounter acpi_timecounter = { .tc_get_timecount = acpi_get_timecount, @@ -56,6 +60,8 @@ struct cfdriver acpitimer_cd = { NULL, "acpitimer", DV_DULL }; +int acpitimer_attached; + int acpitimermatch(struct device *parent, void *match, void *aux) { @@ -98,18 +104,46 @@ acpitimerattach(struct device *parent, s acpi_timecounter.tc_priv = sc; acpi_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(&acpi_timecounter); + +#if defined(__amd64__) || defined(__i386__) + if (delay_func == i8254_delay) + delay_func = acpitimer_delay; +#endif #if defined(__amd64__) extern void cpu_recalibrate_tsc(struct timecounter *); cpu_recalibrate_tsc(&acpi_timecounter); #endif + acpitimer_attached = 1; } +void +acpitimer_delay(int usecs) +{ + uint64_t count = 0, cycles; + struct acpitimer_softc *sc = acpi_timecounter.tc_priv; + uint32_t mask = acpi_timecounter.tc_counter_mask; + uint32_t val1, val2; + + val2 = acpitimer_read(sc); + cycles = usecs * acpi_timecounter.tc_frequency / 1000000; + while (count < cycles) { + CPU_BUSY_CYCLE(); + val1 = val2; + val2 = acpitimer_read(sc); + count += (val2 - val1) & mask; + } +} u_int acpi_get_timecount(struct timecounter *tc) { - struct acpitimer_softc *sc = tc->tc_priv; - u_int u1, u2, u3; + return acpitimer_read(tc->tc_priv); +} + +uint32_t +acpitimer_read(struct acpitimer_softc *sc) +{ + uint32_t u1, u2, u3; u2 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0); u3 = bus_space_read_4(sc->sc_iot, sc->sc_ioh, 0); @@ -120,4 +154,25 @@ acpi_get_timecount(struct timecounter *t } while (u1 > u2 || u2 > u3); return (u2); +} + +void +acpitimer_test_delay(int usecs) +{ + struct timespec ac, er, ex, t0, t1; + + if (!acpitimer_attached) { + printf("%s: (no pmt attached)\n", __func__); + return; + } + + nanouptime(&t0); + acpitimer_delay(usecs); + nanouptime(&t1); + timespecsub(&t1, &t0, &ac); + NSEC_TO_TIMESPEC(usecs * 1000ULL, &ex); + timespecsub(&ac, &ex, &er); + printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n", + __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec, + er.tv_sec, er.tv_nsec); } Index: dev/acpi/acpihpet.c =================================================================== RCS file: /cvs/src/sys/dev/acpi/acpihpet.c,v retrieving revision 1.26 diff -u -p -r1.26 acpihpet.c --- dev/acpi/acpihpet.c 6 Apr 2022 18:59:27 -0000 1.26 +++ dev/acpi/acpihpet.c 16 Aug 2022 16:09:56 -0000 @@ -18,6 +18,7 @@ #include <sys/param.h> #include <sys/systm.h> #include <sys/device.h> +#include <sys/stdint.h> #include <sys/timetc.h> #include <machine/bus.h> @@ -31,7 +32,7 @@ int acpihpet_attached; int acpihpet_match(struct device *, void *, void *); void acpihpet_attach(struct device *, struct device *, void *); int acpihpet_activate(struct device *, int); - +void acpihpet_delay(int); u_int acpihpet_gettime(struct timecounter *tc); uint64_t acpihpet_r(bus_space_tag_t _iot, bus_space_handle_t _ioh, @@ -84,20 +85,28 @@ struct cfdriver acpihpet_cd = { uint64_t acpihpet_r(bus_space_tag_t iot, bus_space_handle_t ioh, bus_size_t ioa) { +#ifdef bus_space_read_8 + return bus_space_read_8(iot, ioh, ioa); +#else uint64_t val; val = bus_space_read_4(iot, ioh, ioa + 4); val = val << 32; val |= bus_space_read_4(iot, ioh, ioa); return (val); +#endif } void acpihpet_w(bus_space_tag_t iot, bus_space_handle_t ioh, bus_size_t ioa, uint64_t val) { +#ifdef bus_space_write_8 + bus_space_write_8(iot, ioh, ioa, val); +#else bus_space_write_4(iot, ioh, ioa + 4, val >> 32); bus_space_write_4(iot, ioh, ioa, val & 0xffffffff); +#endif } int @@ -262,10 +271,20 @@ acpihpet_attach(struct device *parent, s freq = 1000000000000000ull / period; printf(": %lld Hz\n", freq); - hpet_timecounter.tc_frequency = (uint32_t)freq; + hpet_timecounter.tc_frequency = freq; hpet_timecounter.tc_priv = sc; hpet_timecounter.tc_name = sc->sc_dev.dv_xname; tc_init(&hpet_timecounter); + +#if defined(__amd64__) || defined(__i386__) + if (delay_func == i8254_delay) + delay_func = acpihpet_delay; + /* XXX what if the kernel has no acpitimer support? */ + extern void acpitimer_delay(int); + if (delay_func == acpitimer_delay) + delay_func = acpihpet_delay; +#endif + #if defined(__amd64__) extern void cpu_recalibrate_tsc(struct timecounter *); cpu_recalibrate_tsc(&hpet_timecounter); @@ -273,10 +292,43 @@ acpihpet_attach(struct device *parent, s acpihpet_attached++; } +void +acpihpet_delay(int usecs) +{ + uint64_t c, s; + struct acpihpet_softc *sc = hpet_timecounter.tc_priv; + + s = acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER); + c = usecs * hpet_timecounter.tc_frequency / 1000000; + while (acpihpet_r(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER) - s < c) + CPU_BUSY_CYCLE(); +} + u_int acpihpet_gettime(struct timecounter *tc) { struct acpihpet_softc *sc = tc->tc_priv; return (bus_space_read_4(sc->sc_iot, sc->sc_ioh, HPET_MAIN_COUNTER)); +} + +void +acpihpet_test_delay(int usecs) +{ + struct timespec ac, er, ex, t0, t1; + + if (!acpihpet_attached) { + printf("%s: (no hpet attached)\n", __func__); + return; + } + + nanouptime(&t0); + acpihpet_delay(usecs); + nanouptime(&t1); + timespecsub(&t1, &t0, &ac); + NSEC_TO_TIMESPEC(usecs * 1000ULL, &ex); + timespecsub(&ac, &ex, &er); + printf("%s: expected %lld.%09ld actual %lld.%09ld error %lld.%09ld\n", + __func__, ex.tv_sec, ex.tv_nsec, ac.tv_sec, ac.tv_nsec, + er.tv_sec, er.tv_nsec); }