On Thu, May 14, 2020 at 03:38:18PM +0200, Jan Beulich wrote:
> On 14.05.2020 15:10, Roger Pau Monné wrote:
> > On Wed, Apr 15, 2020 at 01:55:24PM +0200, Jan Beulich wrote:
> >> While from just a single Skylake system it is already clear that we
> >> can't base any of our logic on CPUID leaf 15 [1] (leaf 16 is
> >> documented to be used for display purposes only anyway), logging this
> >> information may still give us some reference in case of problems as well
> >> as for future work. Additionally on the AMD side it is unclear whether
> >> the deviation between reported and measured frequencies is because of us
> >> not doing well, or because of nominal and actual frequencies being quite
> >> far apart.
> >
> > Can you add some reference to the AMD implementation? I've looked at
> > the PMs and haven't been able to find a description of some of the
> > MSRs, like 0xC0010064.
>
> Take a look at
>
> https://developer.amd.com/resources/developer-guides-manuals/
>
> I'm unconvinced a reference needs adding here.
Do you think it would be sensible to introduce some defines for at
least 0xC0010064? (ie: MSR_AMD_PSTATE_DEF_BASE)
I think it would make it easier to find on the manuals.
>
> >> --- a/xen/arch/x86/cpu/intel.c
> >> +++ b/xen/arch/x86/cpu/intel.c
> >> @@ -378,6 +378,72 @@ static void init_intel(struct cpuinfo_x8
> >> ( c->cpuid_level >= 0x00000006 ) &&
> >> ( cpuid_eax(0x00000006) & (1u<<2) ) )
> >> __set_bit(X86_FEATURE_ARAT, c->x86_capability);
> >> +
> >
> > I would split this into a separate helper, ie: intel_log_freq. That
> > will allow you to exit early and reduce some of the indentation IMO.
>
> Can do; splitting this for AMD/Hygon however was merely to
> facilitate using it for both vendors, though.
>
> >> + if ( (opt_cpu_info && !(c->apicid & (c->x86_num_siblings - 1))) ||
> >> + c == &boot_cpu_data )
> >> + {
> >> + unsigned int eax, ebx, ecx, edx;
> >> + uint64_t msrval;
> >> +
> >> + if ( c->cpuid_level >= 0x15 )
> >> + {
> >> + cpuid(0x15, &eax, &ebx, &ecx, &edx);
> >> + if ( ecx && ebx && eax )
> >> + {
> >> + unsigned long long val = ecx;
> >> +
> >> + val *= ebx;
> >> + do_div(val, eax);
> >> + printk("CPU%u: TSC: %uMHz * %u / %u = %LuMHz\n",
> >> + smp_processor_id(), ecx, ebx, eax, val);
> >> + }
> >> + else if ( ecx | eax | ebx )
> >> + {
> >> + printk("CPU%u: TSC:", smp_processor_id());
> >> + if ( ecx )
> >> + printk(" core: %uMHz", ecx);
> >> + if ( ebx && eax )
> >> + printk(" ratio: %u / %u", ebx, eax);
> >> + printk("\n");
> >> + }
> >> + }
> >> +
> >> + if ( c->cpuid_level >= 0x16 )
> >> + {
> >> + cpuid(0x16, &eax, &ebx, &ecx, &edx);
> >> + if ( ecx | eax | ebx )
> >> + {
> >> + printk("CPU%u:", smp_processor_id());
> >> + if ( ecx )
> >> + printk(" bus: %uMHz", ecx);
> >> + if ( eax )
> >> + printk(" base: %uMHz", eax);
> >> + if ( ebx )
> >> + printk(" max: %uMHz", ebx);
> >> + printk("\n");
> >> + }
> >> + }
> >> +
> >> + if ( !rdmsr_safe(MSR_INTEL_PLATFORM_INFO, msrval) &&
> >> + (uint8_t)(msrval >> 8) )
> >
> > I would introduce a mask for it would be cleaner, since you use it
> > here and below (and would avoid the casting to uint8_t.
>
> To avoid the casts (also below) I could introduce local variables.
> I specifically wanted to avoid MASK_EXTR() such that the rest of the
> calculations in
>
> if ( (uint8_t)(msrval >> 40) )
> printk("%u..", (factor * (uint8_t)(msrval >> 40) + 50) / 100);
> printk("%u MHz\n", (factor * (uint8_t)(msrval >> 8) + 50) / 100);
>
> can be done as 32-bit arithmetic.
Might be cleaner with the local variables.
> >> + {
> >> + unsigned int factor = 10000;
> >> +
> >> + if ( c->x86 == 6 )
> >> + switch ( c->x86_model )
> >> + {
> >> + case 0x1a: case 0x1e: case 0x1f: case 0x2e: /* Nehalem */
> >> + case 0x25: case 0x2c: case 0x2f: /* Westmere */
> >> + factor = 13333;
> >
> > The SDM lists ratio * 100MHz without any notes, why are those models
> > different, is this some errata?
>
> Did you go through the MSR lists for the various models? It's there
> where I found this anomaly, not in any spec updates.
My bad, I was looking at the Atom table I think, and didn't realize
they where multiple tables instead of a single table with different
notes for models.
>
> >> + break;
> >> + }
> >> +
> >> + printk("CPU%u: ", smp_processor_id());
> >> + if ( (uint8_t)(msrval >> 40) )
> >> + printk("%u..", (factor * (uint8_t)(msrval >> 40) + 50) /
> >> 100);
> >> + printk("%u MHz\n", (factor * (uint8_t)(msrval >> 8) + 50) /
> >> 100);
> >
> > Since you are calculating using Hz, should you use an unsigned long
> > factor to prevent capping at 4GHz?
>
> Hmm, the calculation looks to be in units of 10kHz, until the division
> by 100. I don't think we'd cap at 4GHz this way.
Oh yes, sorry, it's kHz, not Hz.
>
> >> --- a/xen/include/asm-x86/msr.h
> >> +++ b/xen/include/asm-x86/msr.h
> >> @@ -40,8 +40,8 @@ static inline void wrmsrl(unsigned int m
> >>
> >> /* rdmsr with exception handling */
> >> #define rdmsr_safe(msr,val) ({\
> >> - int _rc; \
> >> - uint32_t lo, hi; \
> >> + int rc_; \
> >> + uint32_t lo_, hi_; \
> >> __asm__ __volatile__( \
> >> "1: rdmsr\n2:\n" \
> >> ".section .fixup,\"ax\"\n" \
> >> @@ -49,15 +49,15 @@ static inline void wrmsrl(unsigned int m
> >> " movl %5,%2\n; jmp 2b\n" \
> >> ".previous\n" \
> >> _ASM_EXTABLE(1b, 3b) \
> >> - : "=a" (lo), "=d" (hi), "=&r" (_rc) \
> >> + : "=a" (lo_), "=d" (hi_), "=&r" (rc_) \
> >> : "c" (msr), "2" (0), "i" (-EFAULT)); \
> >> - val = lo | ((uint64_t)hi << 32); \
> >> - _rc; })
> >> + val = lo_ | ((uint64_t)hi_ << 32); \
> >> + rc_; })
> >
> > Since you are changing the local variable names, I would just switch
> > rdmsr_safe to a static inline, and drop the underlines. I don't see a
> > reason this has to stay as a macro.
>
> Well, all callers would need to be changed to pass the address of
> the variable to store the value read into. That's quite a bit of
> code churn, and hence nothing I'd want to do in this patch.
Oh, right, didn't realize it's a macro for that reason.
Thanks, Roger.