Re: pmcstat -T -P instructions -t $pid ==> NMI

Andriy Gapon Fri, 28 Oct 2016 06:52:46 -0700

On 28/10/2016 16:01, Andriy Gapon wrote:
> I suspect that under the right conditions it's possible for wrmsr to cause a
> counter overflow, such that an interrupt (if enabled) is generated after wrmsr
> is executed, even if wrmsr disables the counter.
> 
> In amd_intr() we have this code:
>                 wrmsr(evsel, config & ~AMD_PMC_ENABLE);
>                 wrmsr(perfctr, AMD_RELOAD_COUNT_TO_PERFCTR_VALUE(v));
> 
>                 /* Restart the counter if logging succeeded. */
>                 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
>                     TRAPF_USERMODE(tf));
>                 if (error == 0)
>                         wrmsr(evsel, config | AMD_PMC_ENABLE);
> 
> I suspect that in the scenario above, if it is indeed possible, the last wrmsr
> would re-enable the counter that's supposed to be stopped.
> 
> I think that writing back the original value should be more correct, that is:
>                       wrmsr(evsel, config);
> 
> I'll test if this change would help.


So, I have tried this change:

--- a/sys/dev/hwpmc/hwpmc_amd.c
+++ b/sys/dev/hwpmc/hwpmc_amd.c
@@ -577,6 +577,7 @@ amd_start_pmc(int cpu, int ri)

        PMCDBG1(MDP,STA,2,"amd-start config=0x%x", config);

+       KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu"));
        wrmsr(pd->pm_evsel, config);
        return 0;
 }
@@ -613,6 +614,7 @@ amd_stop_pmc(int cpu, int ri)

        /* turn off the PMC ENABLE bit */
        config = pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE;
+       KASSERT(cpu == PCPU_GET(cpuid), ("requested cpu is not current cpu"));
        wrmsr(pd->pm_evsel, config);
        return 0;
 }
@@ -676,6 +678,7 @@ amd_intr(int cpu, struct trapframe *tf)
                perfctr = AMD_PMC_PERFCTR_0 + i;
                v       = pm->pm_sc.pm_reloadcount;
                config  = rdmsr(evsel);
+               PMCDBG1(MDP,INT,2, "enabled=%d", config & AMD_PMC_ENABLE);

                KASSERT((config & ~AMD_PMC_ENABLE) ==
                    (pm->pm_md.pm_amd.pm_amd_evsel & ~AMD_PMC_ENABLE),
@@ -689,12 +692,13 @@ amd_intr(int cpu, struct trapframe *tf)
                error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,
                    TRAPF_USERMODE(tf));
                if (error == 0)
-                       wrmsr(evsel, config | AMD_PMC_ENABLE);
+                       wrmsr(evsel, config);
        }

        atomic_add_int(retval ? &pmc_stats.pm_intr_processed :
            &pmc_stats.pm_intr_ignored, 1);

+       PMCDBG1(MDP,INT,3, "reval=%d", retval);
        return (retval);
 }



And I couldn't reproduce the problem with it.
Also, in the debug log I see the following, for instance:
315466   1    1068994822044 MDP:INT:3: reval=1
315465   1    1068994821176 MDP:INT:2: enabled=4194304
315464   1    1068994820930 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
315463   1    1068994796610 MDP:INT:3: reval=1
315462   1    1068994795833 MDP:INT:2: enabled=4194304
315461   1    1068994795589 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
315460   1    1068994771107 MDP:INT:3: reval=1
315459   1    1068994770176 MDP:INT:2: enabled=4194304
315458   1    1068994769933 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
315457   0    1068994766498 MDP:SWO:1: pc=0xfffff8001c6e0e00 pp=0x0 enable-msr=0
315456   0    1068994765449 CSW:SWO:1: cpu=0 proc=0xfffff80073767a50 (1655,
pmcstat) pp=0x0
315455   1    1068994742201 MDP:INT:3: reval=1
315454   1    1068994739535 MDP:INT:2: enabled=4194304
315453   1    1068994739169 MDP:INT:1: cpu=1 tf=0xfffffe03c45ecf30 um=1
315452   1    1068994700076 MDP:SWI:1: pc=0xfffff8007318ea00
pp=0xfffff80021f56400 enable-msr=0
315451   1    1068994698957 MDP:STA:2: amd-start config=0x5300c0
315450   1    1068994698699 MDP:STA:1: amd-start cpu=1 ri=0
315449   1    1068994697513 MDP:WRI:1: amd-write cpu=1 ri=0 v=ffffffffffff0000
315448   1    1068994696676 CSW:SWI:1: cpu=1 ri=17 new=65536
315447   1    1068994694784 MDP:CFG:1: cpu=1 ri=0 pm=0xfffff8001c6e0900
315446   1    1068994691210 CSW:SWI:1: cpu=1 proc=0xfffff8017b0d3a50 (1654,
burnK7) pp=0xfffff80021f56400
315445   0    1068994674368 MDP:SWO:1: pc=0xfffff8001c6e0e00
pp=0xfffff80021f56400 enable-msr=0
315444   0    1068994674033 MDP:CFG:1: cpu=0 ri=0 pm=0x0
315443   0    1068994673597 CSW:SWO:1: cpu=0 ri=17 tmp=-3205 (samp)
315442   0    1068994673412 MDP:REA:2: amd-read (post-munge) id=0 -> 65536
315441   0    1068994673247 MDP:REA:2: amd-read (pre-munge) id=0 -> 
281474976645120
315440   0    1068994673006 MDP:REA:1: amd-read id=0 class=2
315439   0    1068994672443 MDP:INT:3: reval=1
315438   0    1068994670981 MDP:INT:2: enabled=0
315437   0    1068994670591 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0
315436   0    1068994669599 MDP:STO:1: amd-stop ri=0
315435   0    1068994668389 CSW:SWO:1: cpu=0 proc=0xfffff8017b0d3a50 (1654,
burnK7) pp=0xfffff80021f56400
315434   0    1068994642304 MDP:INT:3: reval=1
315433   0    1068994641339 MDP:INT:2: enabled=4194304
315432   0    1068994640412 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
315431   0    1068994598561 MDP:INT:3: reval=1
315430   0    1068994596854 MDP:INT:2: enabled=4194304
315429   0    1068994596590 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=0
315428   0    1068994533426 MDP:INT:3: reval=1
315427   0    1068994532643 MDP:INT:2: enabled=4194304
315426   0    1068994532390 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1
315425   0    1068994508088 MDP:INT:3: reval=1
315424   0    1068994507167 MDP:INT:2: enabled=4194304
315423   0    1068994506915 MDP:INT:1: cpu=0 tf=0xffffffff81d769d0 um=1

I think that event #315438 confirms my hypothesis and that previously we would
get a stray NMI after that event.  There is a dozen such events per more than a
million of interrupts.

-- 
Andriy Gapon
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: pmcstat -T -P instructions -t $pid ==> NMI

Reply via email to