On Mon, Feb 08, 2021 at 10:48:46AM -0500, Mark Johnston wrote:
> On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov wrote:
> > On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P bit 
> > > > > on all
> > > > > processors.  I don't have strong opinions about whether we should 
> > > > > commit
> > > > > kib's patch too.  Kib, what do you think?
> > > > 
> > > > The patch causes some memory over-use.
> > > > 
> > > > If this issue is not too widely experienced, I prefer to not commit the 
> > > > patch.
> > > 
> > > Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> > > anything?
> > > 
> > > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > > index 03100e77d45..0619a41b128 100644
> > > --- a/sys/x86/x86/mca.c
> > > +++ b/sys/x86/x86/mca.c
> 
> > I think something should be printed in this case, at least once.
> > I believe printf() already works, because spin locks do.
> 
> Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
> Access to the static flag is synchronized by mca_lock.
> 
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..8098bcfb4bd 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
>  static void
>  cmci_monitor(int i)
>  {
> +     static bool first = true;
>       struct cmc_state *cc;
>       uint64_t ctl;
>  
>       KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> +     /*
> +      * It is possible for some APs to report CMCI support even if the BSP
> +      * does not, apparently due to a BIOS bug.
> +      */
> +     if (cmc_state == NULL) {
> +             if (first) {
> +                     printf(
I would wrote if (bootverbose) printf().
Also it might be useful to report ACPI id/APIC id as well, since the data
is most likely the source for BIOS bug report.

Otherwise fine with me.

> +                         "AP %d reports CMCI support but the BSP does not\n",
> +                         PCPU_GET(cpuid));
> +                     first = false;
> +             }
> +             return;
> +     }
> +
>       ctl = rdmsr(MSR_MC_CTL2(i));
>       if (ctl & MC_CTL2_CMCI_EN)
>               /* Already monitored by another CPU. */
> @@ -1114,6 +1129,10 @@ cmci_resume(int i)
>  
>       KASSERT(i < mca_banks, ("CPU %d has more MC banks", PCPU_GET(cpuid)));
>  
> +     /* See cmci_monitor(). */
> +     if (cmc_state == NULL)
> +             return;
> +
>       /* Ignore banks not monitored by this CPU. */
>       if (!(PCPU_GET(cmci_mask) & 1 << i))
>               return;
_______________________________________________
[email protected] mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to