On Sun, Sep 04, 2022 at 02:50:10PM +1000, Jonathan Gray wrote: > On Sat, Sep 03, 2022 at 05:33:01PM -0500, Scott Cheloha wrote: > > On Sat, Sep 03, 2022 at 10:37:31PM +1000, Jonathan Gray wrote: > > > On Sat, Sep 03, 2022 at 06:52:20AM -0500, Scott Cheloha wrote: > > > > > On Sep 3, 2022, at 02:22, Jonathan Gray <j...@jsg.id.au> wrote: > > > > > > > > > > ???On Fri, Sep 02, 2022 at 06:00:25PM -0500, Scott Cheloha wrote: > > > > >> dv@ suggested coming to the list to request testing for the > > > > >> pvclock(4) > > > > >> driver. Attached is a patch that corrects several bugs. Most of > > > > >> these changes will only matter in the non-TSC_STABLE case on a > > > > >> multiprocessor VM. > > > > >> > > > > >> Ideally, nothing should break. > > > > >> > > > > >> - pvclock yields a 64-bit value. The BSD timecounter layer can only > > > > >> use the lower 32 bits, but internally we need to track the full > > > > >> 64-bit value to allow comparisons with the full value in the > > > > >> non-TSC_STABLE case. So make pvclock_lastcount a 64-bit quantity. > > > > >> > > > > >> - In pvclock_get_timecount(), move rdtsc() up into the lockless read > > > > >> loop to get a more accurate timestamp. > > > > >> > > > > >> - In pvclock_get_timecount(), use rdtsc_lfence(), not rdtsc(). > > > > >> > > > > >> - In pvclock_get_timecount(), check that our TSC value doesn't > > > > >> predate > > > > >> ti->ti_tsc_timestamp, otherwise we will produce an enormous value. > > > > >> > > > > >> - In pvclock_get_timecount(), update pvclock_lastcount in the > > > > >> non-TSC_STABLE case with more care. On amd64 we can do this with an > > > > >> atomic_cas_ulong(9) loop because u_long is 64 bits. On i386 we need > > > > >> to introduce a mutex to protect our comparison and read/write. > > > > > > > > > > i386 has cmpxchg8b, no need to disable interrupts > > > > > the ifdefs seem excessive > > > > > > > > How do I make use of CMPXCHG8B on i386 > > > > in this context? > > > > > > > > atomic_cas_ulong(9) is a 32-bit CAS on > > > > i386. > > > > > > static inline uint64_t > > > atomic_cas_64(volatile uint64_t *p, uint64_t o, uint64_t n) > > > { > > > return __sync_val_compare_and_swap(p, o, n); > > > } > > > > > > Or md atomic.h files could have an equivalent. > > > Not possible on all 32-bit archs. > > > > > > > > > > > We can't use FP registers in the kernel, no? > > > > > > What do FP registers have to do with it? > > > > > > > > > > > Am I missing some other avenue? > > > > > > There is no rdtsc_lfence() on i386. Initial diff doesn't build. > > > > LFENCE is an SSE2 extension. As is MFENCE. I don't think I can just > > drop rdtsc_lfence() into cpufunc.h and proceed without causing some > > kind of fault on an older CPU. > > > > What are my options on a 586-class CPU for forcing RDTSC to complete > > before later instructions? > > "3.3.2. Serializing Operations > After executing certain instructions the Pentium processor serializes > instruction execution. This means that any modifications to flags, > registers, and memory for previous instructions are completed before > the next instruction is fetched and executed. The prefetch queue > is flushed as a result of serializing operations. > > The Pentium processor serializes instruction execution after executing > one of the following instructions: Move to Special Register (except > CRO), INVD, INVLPG, IRET, IRETD, LGDT, LLDT, LIDT, LTR, WBINVD, > CPUID, RSM and WRMSR." > > from: > Pentium Processor User's Manual > Volume 1: Pentium Processor Data Book > Order Number 241428 > > http://bitsavers.org/components/intel/pentium/1993_Intel_Pentium_Processor_Users_Manual_Volume_1.pdf > > So it could be rdtsc ; cpuid. > lfence; rdtsc should still be preferred. > > It could be tested during boot and set a function pointer. > Or the codepatch bits could be used. > > In the specific case of pvclock, can it be assumed that the host > has hardware virt and would then have lfence? >
I think this is a fair assumption. -ml