On Thu, Jan 07, 2021 at 09:37:58PM +0100, Mark Kettenis wrote:
> > Date: Thu, 7 Jan 2021 11:15:41 -0600
> > From: Scott Cheloha <[email protected]>
> > 
> > Hi,
> > 
> > I want to isolate statclock() from hardclock(9).  This will simplify
> > the logic in my WIP dynamic clock interrupt framework.
> > 
> > Currently, if stathz is zero, we call statclock() from within
> > hardclock(9).  It looks like this (see sys/kern/kern_clock.c):
> > 
> > void
> > hardclock(struct clockframe *frame)
> > {
> >     /* [...] */
> > 
> >     if (stathz == 0)
> >             statclock(frame);
> > 
> >     /* [...] */
> > 
> > This is the case on alpha, amd64 (w/ lapic), hppa, i386 (w/ lapic),
> > loongson, luna88k, mips64, and sh.
> > 
> > (We seem to do it on sgi, too.  I was under the impression that sgi
> > *was* a mips64 platform, yet sgi seems to it have its own clock
> > interrupt code distinct from mips64's general clock interrupt code
> > which is used by e.g. octeon).
> > 
> > However, if stathz is not zero we call statclock() separately.  This
> > is the case on armv7, arm, arm64, macppc, powerpc64, and sparc64.
> > 
> > (The situation for the general powerpc code and socppc in particular
> > is a mystery to me.)
> > 
> > If we could remove this MD distinction it would make my MI framework
> > simpler.  Instead of checking stathz and conditionally starting a
> > statclock event I will be able to unconditionally start a statclock
> > event on all platforms on every CPU.
> > 
> > In general I don't think the "is stathz zero?" variance between
> > platforms is useful:
> > 
> > - The difference is invisible to userspace, as we hide the fact that
> >   stathz is zero from e.g. the kern.clockrate sysctl.
> > 
> > - We run statclock() at some point, regardless of whether stathz is
> >   zero.  If statclock() is run from hardclock(9) then isn't stathz
> >   effectively equal to hz?
> > 
> > - Because stathz might be zero we need to add a bunch of safety checks
> >   to our MI code to ensure we don't accidentally divide by zero.
> > 
> > Maybe we can ensure stathz is non-zero in a later diff...
> > 
> > --
> > 
> > Anyway, I don't think I have missed any platforms.  However, if
> > platform experts could weigh in here to verify my changes (and test
> > them!) I'd really appreciate it.
> > 
> > In particular, I'm confused about how clock interrupts work on
> > powerpc, socppc, and sgi.
> > 
> > --
> > 
> > Thoughts?  Platform-specific OKs?
> 
> I wouldn't be opposed to doing this.  It is less magic!
> 
> But yes, this needs to be tested on the platforms that you change.

I guess I'll CC all the platform-specific people I'm aware of.

> Note that many platforms don't have have separate schedclock and
> statclock.  But on many platforms where we use a one-shot timer as the
> clock we have a randomized statclock.  I'm sure Theo would love to
> tell you about the cpuhog story...

I am familiar with cpuhog.  It's the one thing everybody mentions when
I talk about clock interrupts and/or statclock().

Related:

I wonder if we could avoid the cpuhog problem entirely by implementing
some kind of MI cycle counting clock API that we use to timestamp
whenever we cross the syscall boundary, or enter an interrupt, etc.,
to determine the time a thread spends using the CPU without any
sampling error.

Instead of a process accumulating ticks from a sampling clock
interrupt you would accumulate, say, a 64-bit count of cycles, or
something like that.

Sampling with a regular clock interrupt is prone to error and trickery
like cpuhog.  The classic BSD solution to the cpuhog exploit was to
randomize the statclock/schedclock to make it harder to fool the
sampler.  But if we used cycle counts or instruction counts at each
state transition it would be impossible to fool because we wouldn't be
sampling at all.

Unsure what the performance implications would be, but in general I
would guess that most platforms have a way to count instructions or
cycles and that reading this data is fast enough for us to use it in
e.g. syscall() or the interrupt handler without a huge performance
hit.

> Anyway, we probably want that on amd64 as well.

My WIP dynamic clock interrupt system can run a randomized statclock()
on amd64 boxes with a lapic.  I imagine we will be able to do the same
on i386 systems that have a lapic, too, though it will be slower
because all the i386 timecounters are glacial compared to the TSC.

Eventually I want to isolate schedclock() from statclock() and run it
as an independent event.  But that's a "later on" goal.  For now I'm
just trying to get every platform as similar as possible to make
merging the dynamic clock interrupt work less painful.

Index: sys/kern/kern_clock.c
===================================================================
RCS file: /cvs/src/sys/kern/kern_clock.c,v
retrieving revision 1.101
diff -u -p -r1.101 kern_clock.c
--- sys/kern/kern_clock.c       21 Jan 2020 16:16:23 -0000      1.101
+++ sys/kern/kern_clock.c       7 Jan 2021 16:37:09 -0000
@@ -164,12 +164,6 @@ hardclock(struct clockframe *frame)
                }
        }
 
-       /*
-        * If no separate statistics clock is available, run it from here.
-        */
-       if (stathz == 0)
-               statclock(frame);
-
        if (--ci->ci_schedstate.spc_rrticks <= 0)
                roundrobin(ci);
 
Index: sys/arch/alpha/alpha/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/alpha/alpha/clock.c,v
retrieving revision 1.24
diff -u -p -r1.24 clock.c
--- sys/arch/alpha/alpha/clock.c        6 Jul 2020 13:33:06 -0000       1.24
+++ sys/arch/alpha/alpha/clock.c        7 Jan 2021 16:37:09 -0000
@@ -136,6 +136,13 @@ clockattach(dev, fns)
  * Machine-dependent clock routines.
  */
 
+void
+clockintr(struct clockframe *frame)
+{
+       hardclock(frame);
+       statclock(frame);
+}
+
 /*
  * Start the real-time and statistics clocks. Leave stathz 0 since there
  * are no other timers available.
@@ -165,7 +172,7 @@ cpu_initclocks(void)
         * hardclock, which would then fall over because the pointer
         * to the virtual timers wasn't set at that time.
         */
-       platform.clockintr = hardclock;
+       platform.clockintr = clockintr;
        schedhz = 16;
 
        evcount_attach(&clk_count, "clock", &clk_irq);
Index: sys/arch/amd64/amd64/lapic.c
===================================================================
RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
retrieving revision 1.57
diff -u -p -r1.57 lapic.c
--- sys/arch/amd64/amd64/lapic.c        6 Sep 2020 20:50:00 -0000       1.57
+++ sys/arch/amd64/amd64/lapic.c        7 Jan 2021 16:37:09 -0000
@@ -452,6 +452,7 @@ lapic_clockintr(void *arg, struct intrfr
        floor = ci->ci_handled_intr_level;
        ci->ci_handled_intr_level = ci->ci_ilevel;
        hardclock((struct clockframe *)&frame);
+       statclock((struct clockframe *)&frame);
        ci->ci_handled_intr_level = floor;
 
        clk_count.ec_count++;
@@ -596,6 +597,9 @@ skip_calibration:
  * delay for N usec.
  */
 
+#include <sys/atomic.h>
+unsigned long lapic_delay_neg, lapic_delay_zero;
+
 void
 lapic_delay(int usec)
 {
@@ -604,8 +608,15 @@ lapic_delay(int usec)
 
        otick = lapic_gettick();
 
-       if (usec <= 0)
+       if (usec < 0) {
+               atomic_inc_long(&lapic_delay_neg);
+               return;
+       }
+       if (usec == 0) {
+               atomic_inc_long(&lapic_delay_zero);
                return;
+       }
+
        if (usec <= 25)
                deltat = lapic_delaytab[usec];
        else
Index: sys/arch/hppa/dev/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/hppa/dev/clock.c,v
retrieving revision 1.31
diff -u -p -r1.31 clock.c
--- sys/arch/hppa/dev/clock.c   6 Jul 2020 13:33:07 -0000       1.31
+++ sys/arch/hppa/dev/clock.c   7 Jan 2021 16:37:09 -0000
@@ -43,7 +43,7 @@
 
 u_long cpu_hzticks;
 
-int    cpu_hardclock(void *);
+int    cpu_clockintr(void *);
 u_int  itmr_get_timecount(struct timecounter *);
 
 struct timecounter itmr_timecounter = {
@@ -106,7 +106,7 @@ cpu_initclocks(void)
 }
 
 int
-cpu_hardclock(void *v)
+cpu_clockintr(void *v)
 {
        struct cpu_info *ci = curcpu();
        u_long __itmr, delta, eta;
@@ -114,14 +114,15 @@ cpu_hardclock(void *v)
        register_t eiem;
 
        /*
-        * Invoke hardclock as many times as there has been cpu_hzticks
-        * ticks since the last interrupt.
+        * Invoke hardclock and statclock as many times as there has been
+        * cpu_hzticks ticks since the last interrupt.
         */
        for (;;) {
                mfctl(CR_ITMR, __itmr);
                delta = __itmr - ci->ci_itmr;
                if (delta >= cpu_hzticks) {
                        hardclock(v);
+                       statclock(v);
                        ci->ci_itmr += cpu_hzticks;
                } else
                        break;
Index: sys/arch/hppa/dev/cpu.c
===================================================================
RCS file: /cvs/src/sys/arch/hppa/dev/cpu.c,v
retrieving revision 1.42
diff -u -p -r1.42 cpu.c
--- sys/arch/hppa/dev/cpu.c     29 May 2020 04:42:23 -0000      1.42
+++ sys/arch/hppa/dev/cpu.c     7 Jan 2021 16:37:09 -0000
@@ -89,7 +89,7 @@ cpuattach(struct device *parent, struct 
        extern u_int cpu_ticksnum, cpu_ticksdenom;
        extern u_int fpu_enable;
        /* clock.c */
-       extern int cpu_hardclock(void *);
+       extern int cpu_clockintr(void *);
        /* ipi.c */
        extern int hppa_ipi_intr(void *);
 
@@ -173,7 +173,7 @@ cpuattach(struct device *parent, struct 
                printf(", %u/%u D/I BTLBs",
                    pdc_btlb.finfo.num_i, pdc_btlb.finfo.num_d);
 
-       cpu_intr_establish(IPL_CLOCK, 31, cpu_hardclock, NULL, "clock");
+       cpu_intr_establish(IPL_CLOCK, 31, cpu_clockintr, NULL, "clock");
 #ifdef MULTIPROCESSOR
        cpu_intr_establish(IPL_IPI, 30, hppa_ipi_intr, NULL, "ipi");
 #endif
Index: sys/arch/i386/i386/lapic.c
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/lapic.c,v
retrieving revision 1.47
diff -u -p -r1.47 lapic.c
--- sys/arch/i386/i386/lapic.c  30 Jul 2018 14:19:12 -0000      1.47
+++ sys/arch/i386/i386/lapic.c  7 Jan 2021 16:37:09 -0000
@@ -257,6 +257,7 @@ lapic_clockintr(void *arg)
        struct clockframe *frame = arg;
 
        hardclock(frame);
+       statclock(frame);
 
        clk_count.ec_count++;
 }
Index: sys/arch/loongson/dev/glxclk.c
===================================================================
RCS file: /cvs/src/sys/arch/loongson/dev/glxclk.c,v
retrieving revision 1.5
diff -u -p -r1.5 glxclk.c
--- sys/arch/loongson/dev/glxclk.c      19 Jul 2015 21:11:47 -0000      1.5
+++ sys/arch/loongson/dev/glxclk.c      7 Jan 2021 16:37:09 -0000
@@ -286,6 +286,7 @@ glxclk_intr(void *arg)
                return 1;
 
        hardclock(frame);
+       statclock(frame);
 
        return 1;
 }
Index: sys/arch/luna88k/luna88k/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/luna88k/luna88k/clock.c,v
retrieving revision 1.15
diff -u -p -r1.15 clock.c
--- sys/arch/luna88k/luna88k/clock.c    6 Jul 2020 13:33:07 -0000       1.15
+++ sys/arch/luna88k/luna88k/clock.c    7 Jan 2021 16:37:09 -0000
@@ -165,8 +165,10 @@ clockintr(void *eframe)
                clockevc->ec_count++;
 
        *(volatile uint32_t *)(ci->ci_clock_ack) = ~0;
-       if (clockinitted)
+       if (clockinitted) {
                hardclock(eframe);
+               statclock(eframe);
+       }
        return 1;
 }
 
Index: sys/arch/mips64/mips64/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/mips64/mips64/clock.c,v
retrieving revision 1.42
diff -u -p -r1.42 clock.c
--- sys/arch/mips64/mips64/clock.c      30 Jun 2020 14:56:10 -0000      1.42
+++ sys/arch/mips64/mips64/clock.c      7 Jan 2021 16:37:09 -0000
@@ -151,6 +151,7 @@ cp0_int5(uint32_t mask, struct trapframe
                while (ci->ci_pendingticks) {
                        cp0_clock_count.ec_count++;
                        hardclock(tf);
+                       statclock(tf);
                        ci->ci_pendingticks--;
                }
 #ifdef MULTIPROCESSOR
Index: sys/arch/sgi/localbus/int.c
===================================================================
RCS file: /cvs/src/sys/arch/sgi/localbus/int.c,v
retrieving revision 1.15
diff -u -p -r1.15 int.c
--- sys/arch/sgi/localbus/int.c 24 Feb 2018 11:42:31 -0000      1.15
+++ sys/arch/sgi/localbus/int.c 7 Jan 2021 16:37:09 -0000
@@ -524,6 +524,7 @@ int_8254_intr0(uint32_t hwpend, struct t
                        while (ci->ci_pendingticks) {
                                int_clock_count.ec_count++;
                                hardclock(tf);
+                               statclock(tf);
                                ci->ci_pendingticks--;
                        }
                }
Index: sys/arch/sh/sh/clock.c
===================================================================
RCS file: /cvs/src/sys/arch/sh/sh/clock.c,v
retrieving revision 1.11
diff -u -p -r1.11 clock.c
--- sys/arch/sh/sh/clock.c      20 Oct 2020 15:59:17 -0000      1.11
+++ sys/arch/sh/sh/clock.c      7 Jan 2021 16:37:10 -0000
@@ -333,6 +333,7 @@ sh3_clock_intr(void *arg) /* trap frame 
        _reg_bclr_2(SH3_TCR0, TCR_UNF);
 
        hardclock(arg);
+       statclock(arg);
 
        return (1);
 }
@@ -354,6 +355,7 @@ sh4_clock_intr(void *arg) /* trap frame 
        _reg_bclr_2(SH4_TCR0, TCR_UNF);
 
        hardclock(arg);
+       statclock(arg);
 
        return (1);
 }

Reply via email to