On Sat, Sep 24, 2022 at 11:06:24AM +1000, Jonathan Gray wrote: > On Fri, Sep 23, 2022 at 09:16:25AM -0500, Scott Cheloha wrote: > > Hi, > > > > TL;DR: > > > > I want to compute the TSC frequency on AMD CPUs using the methods laid > > out in the AMD manuals instead of calibrating the TSC by hand. > > > > If you have an AMD CPU with an invariant TSC, please apply this patch, > > recompile/boot the resulting kernel, and send me the resulting dmesg. > > > > Family 10h-16h CPUs are especially interesting. If you've got one, > > don't be shy! > > > > Long explanation: > > > > On AMD CPUs we calibrate the TSC with a separate timer. This is slow > > and introduces error. I also worry about a future where legacy timers > > are absent or heavily gated (read: useless). > > > > This patch adds most of the code needed to compute the TSC frequency > > on AMD family 10h+ CPUs. CPUs prior to family 10h did not support an > > invariant TSC so they are irrelevant. > > > > I have riddled the code with printf(9) calls so I can work out what's > > wrong by hand if a test result makes no sense. > > > > The only missing piece is code to read the configuration space on > > family 10h-16h CPUs to determine how many boosted P-states we need to > > skip to get to the MSR describing the software P0 state. I would > > really appreciate it if someone could explain how to do this at this > > very early point in boot. jsg@ pointed me to pci_conf_read(9), but > > I'm a little confused about how I get the needed pci* inputs at this > > point in boot. > > I also said you shouldn't be looking at pci devices for this.
What you want to look at is section 2.1.4 of this: https://developer.amd.com/wp-content/resources/56255_3_03.PDF It describes what you need to do. It's for family 17 but I would guess that there is an equivalent family 10/12/etc doc, and I'd be surprised if any of this has changed in a long time. If you can't figure it out, I'd suggest that we don't do this for family 10/12/etc and use the old method for CPUs that don't have the MSRs you need. I also sorta share jsg's opinion below, this feels like a solution for a problem that really doesn't exist. -ml > > I remain unconvinced that all of this is worth it compared to > calibrating off a timer with a known rate. And it is the wrong time in > the release cycle for this. > > Boost could be disabled for the measurement if need by. > > AMD64 Architecture Programmer's Manual > Volume 2: System Programming > Publication No. 24593 > Revision 3.38 > > "17.2 Core Performance Boost > ... > CPB can be disabled using the CPBDis field of the Hardware Configuration > Register (HWCR MSR) on the appropriate core. When CPB is disabled, > hardware limits the frequency and voltage of the core to those defined > by P0. > > Support for core performance boost is indicated by > CPUID Fn8000_0007_EDX[CPB] = 1." > > "3.2.10 Hardware Configuration Register (HWCR) > ... > CpbDis. Bit 25. Core performance boost disable. When set to 1, core > performance boost is disabled. > " > > Processor Programming Reference (PPR) > for AMD Family 17h Model 01h, Revision B1 Processors > 54945 Rev 1.14 - April 15, 2017 > > "MSRC001_0015 [Hardware Configuration] (HWCR) > > 25 CpbDis: core performance boost disable. Read-write. > Reset: 0. 0=CPB is requested to be enabled. 1=CPB is disabled. > Specifies whether core performance boost is requested to be enabled or > disabled. If core performance boost is disabled while a core is in a > boosted P-state, the core automatically transitions to the highest > performance non-boosted P-state." > > also mentioned in > > BIOS and Kernel Developer's Guide (BKDG) > For AMD Family 10h Processors > 31116 Rev 3.48 - April 22, 2010 > > > > > -- > > > > Test results? Clues on reading the configuration space? > > > > -Scott > > > > Index: tsc.c > > =================================================================== > > RCS file: /cvs/src/sys/arch/amd64/amd64/tsc.c,v > > retrieving revision 1.29 > > diff -u -p -r1.29 tsc.c > > --- tsc.c 22 Sep 2022 04:57:08 -0000 1.29 > > +++ tsc.c 23 Sep 2022 14:04:22 -0000 > > @@ -100,6 +100,253 @@ tsc_freq_cpuid(struct cpu_info *ci) > > return (0); > > } > > > > +uint64_t > > +tsc_freq_msr(struct cpu_info *ci) > > +{ > > + uint64_t base, def, did, did_lsd, did_msd, divisor, fid, multiplier; > > + uint32_t msr, off = 0; > > + > > + if (strcmp(cpu_vendor, "AuthenticAMD") != 0) > > + return 0; > > + > > + /* > > + * All family 10h+ CPUs have MSR_HWCR and the TscFreqSel bit. > > + * If TscFreqSel is not set the TSC does not advance at the P0 > > + * frequency, in which case something is wrong and we need to > > + * calibrate by hand. > > + */ > > +#define HWCR_TSCFREQSEL (1 << 24) > > + if (!ISSET(rdmsr(MSR_HWCR), HWCR_TSCFREQSEL)) /* XXX specialreg.h */ > > + return 0; > > +#undef HWCR_TSCFREQSEL > > + > > + /* > > + * For families 10h, 12h, 14h, 15h, and 16h, we need to skip past > > + * the boosted P-states (Pb0, Pb1, etc.) to find the MSR describing > > + * P0, i.e. the highest performance unboosted P-state. The number > > + * of boosted states is kept in the "Core Performance Boost Control" > > + * configuration space register. > > + */ > > +#ifdef __not_yet__ > > + uint32_t reg; > > + switch (ci->ci_family) { > > + case 0x10: > > + /* XXX How do I read config space at this point in boot? */ > > + reg = read_config_space(F4x15C); > > + off = (reg >> 2) & 0x1; > > + break; > > + case 0x12: > > + case 0x14: > > + case 0x15: > > + case 0x16: > > + /* XXX How do I read config space at this point in boot? */ > > + reg = read_config_space(D18F4x15C); > > + off = (reg >> 2) & 0x7; > > + break; > > + default: > > + break; > > + } > > +#endif > > + > > +/* DEBUG Let's look at all the MSRs to check my math. */ > > +for (; off < 8; off++) { > > + > > + /* > > + * In family 10h+, core P-state voltage/frequency definitions > > + * are kept in MSRs C001_006[4:B] (eight registers in total). > > + * All MSRs in the range are readable, but if the EN bit isn't > > + * set the register doesn't define a valid P-state. > > + */ > > + msr = 0xc0010064 + off; /* XXX specialreg.h */ > > + def = rdmsr(msr); > > + printf("%s: MSR %04X_%04X: en %d", > > + ci->ci_dev->dv_xname, msr >> 16, msr & 0xffff, > > + !!ISSET(def, 1ULL << 63)); > > + if (!ISSET(def, 1ULL << 63)) { /* XXX specialreg.h */ > > + printf("\n"); > > + continue; > > + } > > + switch (ci->ci_family) { > > + case 0x10: > > + /* AMD Family 10h Processor BKDG, Rev 3.62, p. 429 */ > > + base = 100000000; /* 100.0 MHz */ > > + did = (def >> 6) & 0x7; > > + divisor = 1ULL << did; > > + fid = def & 0x1f; > > + multiplier = fid + 0x10; > > + printf(" base %llu did %llu div %llu fid %llu mul %llu", > > + base, did, divisor, fid, multiplier); > > + break; > > + case 0x11: > > + /* AMD Family 11h Processor BKDG, Rev 3.62, p. 236 */ > > + base = 100000000; /* 100.0 MHz */ > > + did = (def >> 6) & 0x7; > > + divisor = 1ULL << did; > > + fid = def & 0x1f; > > + multiplier = fid + 0x8; > > + printf(" base %llu did %llu div %llu fid %llu mul %llu", > > + base, did, divisor, fid, multiplier); > > + break; > > + case 0x12: > > + /* AMD Family 12h Processor BKDG, Rev 3.02, pp. 468-469 */ > > + base = 100000000; /* 100.0 MHz */ > > + fid = (def >> 4) & 0xf; > > + multiplier = fid + 0x10; > > + > > + /* > > + * A CpuDid of 1 maps to a divisor of 1.5. To simulate > > + * this with integer math we use a divisor of 3 and double > > + * the multiplier, as (X * 2 / 3) equals (X / 1.5). All > > + * other CpuDid values map to to whole number divisors > > + * or are reserved. > > + */ > > + did = def & 0xf; > > + printf(" did %llu", did); > > + if (did >= 8) { > > + printf("(reserved)\n"); > > + continue; /* reserved */ > > + } > > + if (did == 1) > > + multiplier *= 2; > > + uint64_t did_divisor[] = { 1, 3, 2, 3, 4, 6, 8, 12, 16 }; > > + divisor = did_divisor[did]; > > + printf(" div %llu base %llu fid %llu mul %llu", > > + divisor, base, fid, multiplier); > > + break; > > + case 0x14: > > + /* > > + * BKDG for AMD Family 14h Models 00h-0Fh Processors, > > + * Rev 3.13, pp. 428-429 > > + * > > + * Family 14h doesn't have CpuFid or CpuDid. Instead, > > + * the CpuCOF divisor is derived from two new fields: > > + * CpuDidMsd, the integral base, and CpuDidLsd, the > > + * fractional multiplier. The formula for the divisor > > + * varies with the magnitude of CpuDidMsd: > > + * > > + * CpuDidMsd <= 14: CpuDidMsd + 1 + (CpuDidLsd * 0.25) > > + * CpuDidMsd >= 15: CpuDidMsd + 1 + ((CpuDidLsd & 0x10) * 0.25) > > + * > > + * CpuCOF is just (base / divisor), however we need to > > + * multiply both sides by 100 to simulate fractional > > + * division with integer math, e.g. (X * 100 / 125) is > > + * equivalent to (X / 1.25). > > + */ > > +#if __not_yet__ > > + /* XXX How do I read config space at this point in boot? */ > > + reg = read_config_space(D18F3xD4); > > + base = 100000000 * ((reg & 0x3f) + 0x10); > > +#else > > + base = 100000000; /* XXX guess 100.0 MHz for now... */ > > +#endif > > + multiplier = 100; > > + did_msd = (def >> 4) & 0x19; > > + printf(" msd %llu", did_msd); > > + if (did_msd >= 27) { > > + printf("(reserved)\n"); > > + continue; /* XXX might be reserved? */ > > + } > > + did_lsd = def & 0xf; > > + printf(" lsd %llu", did_lsd); > > + if (did_lsd >= 4) { > > + printf("(reserved)\n"); > > + continue; /* reserved */ > > + } > > + if (did_msd >= 15) > > + did_lsd &= 0x10; > > + divisor = (did_msd + 1) * 100 + (did_lsd * 25); > > + printf(" div %llu base %llu mul %llu", > > + divisor, base, multiplier); > > + break; > > + case 0x15: > > + /* > > + * BKDG for AMD Family 15h [...]: > > + * Models 00h-OFh Processors, Rev 3.14, pp. 569-571 > > + * Models 10h-1Fh Processors, Rev 3.12, pp. 580-581 > > + * Models 30h-3Fh Processors, Rev 3.06, pp. 634-636 > > + * Models 60h-6Fh Processors, Rev 3.05, pp. 691-693 > > + * Models 70h-7Fh Processors, Rev 3.09, pp. 655-656 > > + */ > > + base = 100000000; /* 100.0 Mhz */ > > + did = (def >> 6) & 0x7; > > + printf(" base %llu did %llu", base, did); > > + if (did >= 0x5) { > > + printf("(reserved)\n"); > > + continue; /* reserved */ > > + } > > + divisor = 1ULL << did; > > + > > + /* > > + * BKDG for AMD Family 15h Models 00h-0Fh, Rev 3.14, p. 571 > > + * says that "CpuFid must be less than or equal to 2Fh." > > + * No other BKDG for family 15h limits the range of CpuFid. > > + */ > > + fid = def & 0x3f; > > + printf(" fid %llu", fid); > > + if (ci->ci_model <= 0x0f && fid >= 0x30) { > > + printf("(reserved)\n"); > > + continue; /* reserved */ > > + } > > + multiplier = fid + 0x10; > > + printf(" mul %llu div %llu", multiplier, divisor); > > + break; > > + case 0x16: > > + /* > > + * BKDG for AMD Family 16h [...]: > > + * Models 00h-0Fh Processors, Rev 3.03, pp. 548-550 > > + * Models 30h-3Fh Processors, Rev 3.06, pp. 610-612 > > + */ > > + base = 100000000; /* 100.0 MHz */ > > + did = (def >> 6) & 0x7; > > + printf(" did %llu", did); > > + if (did >= 0x5) { > > + printf("(reserved)\n"); > > + continue; /* reserved */ > > + } > > + divisor = 1ULL << did; > > + fid = def & 0x3f; > > + multiplier = fid + 0x10; > > + printf(" divisor %llu base %llu fid %llu mul %llu", > > + divisor, base, fid, multiplier); > > + break; > > + case 0x17: > > + /* > > + * PPR for AMD Family 17h [...]: > > + * Models 01h,08h B2, Rev 3.03, pp. 33, 139-140 > > + * Model 18h B1, Rev 3.16, pp. 36, 143-144 > > + * Model 60h A1, Rev 3.06, pp. 33, 155-157 > > + * Model 71h B0, Rev 3.06, pp. 28, 150-151 > > + * > > + * OSRR for AMD Family 17h processors, > > + * Models 00h-2Fh, Rev 3.03, pp. 130-131 > > + */ > > + base = 200000000; /* 200.0 MHz */ > > + divisor = did = (def >> 8) & 0x3f; /* XXX reserved vals? */ > > + multiplier = fid = def & 0xff; > > + printf(" base %llu mul %llu div %llu", > > + base, multiplier, divisor); > > + break; > > + case 0x19: > > + /* > > + * PPR for AMD Family 19h > > + * Model 21h B0, Rev 3.05, pp. 33, 166-167 > > + */ > > + base = 200000000; /* 200.0 MHz */ > > + divisor = did = (def >> 8) & 0x3f; /* XXX reserved vals? */ > > + multiplier = fid = def & 0xff; > > + printf(" base %llu mul %llu div %llu", > > + base, multiplier, divisor); > > + break; > > + default: > > + return 0; > > + } > > + printf(" freq %llu Hz\n", base * multiplier / divisor); > > +} > > +/* DEBUG for-loop ends here. */ > > + > > + return 0; > > +} > > + > > void > > tsc_identify(struct cpu_info *ci) > > { > > @@ -118,6 +365,8 @@ tsc_identify(struct cpu_info *ci) > > tsc_is_invariant = 1; > > > > tsc_frequency = tsc_freq_cpuid(ci); > > + if (tsc_frequency == 0) > > + tsc_frequency = tsc_freq_msr(ci); > > if (tsc_frequency > 0) > > delay_init(tsc_delay, 5000); > > } > > @@ -170,6 +419,8 @@ measure_tsc_freq(struct timecounter *tc) > > u_long s; > > int delay_usec, i, err1, err2, usec, success = 0; > > > > + printf("tsc: calibrating with %s: ", tc->tc_name); > > + > > /* warmup the timers */ > > for (i = 0; i < 3; i++) { > > (void)tc->tc_get_timecount(tc); > > @@ -202,6 +453,8 @@ measure_tsc_freq(struct timecounter *tc) > > min_freq = MIN(min_freq, frequency); > > success++; > > } > > + > > + printf("%llu Hz\n", success > 1 ? min_freq : 0); > > > > return (success > 1 ? min_freq : 0); > > } > > > > >