On Wed, Aug 05, 2015 at 11:18:52AM +0800, Huang Rui wrote:
> MWAITX can enable a timer and a corresponding timer value specified in
> SW P0 clocks. The SW P0 frequency is the same as TSC. The timer
> provides an upper bound on how long the instruction waits before
> exiting.
> 
> The implementation of delay function in kernel can leverage the timer
> of MWAITX. This patch provides a new method (delay_mwaitx) to measure
> delay time.

...

> +static void delay_mwaitx(unsigned long __loops)
> +{
> +     u32 delay, loops = __loops;
> +     u64 end, start;

Hmm, this truncates __loops in case someone wants to delay for more
than (u32)-1 TSC clocks. I guess the right thing to do is to do the
calculation with u64s and MWAITX_MAX_LOOPS will keep us within bounds.

Here's what I did:

---
From: Huang Rui <[email protected]>
Date: Wed, 5 Aug 2015 11:18:52 +0800
Subject: [PATCH] x86/asm: Introduce an MWAITX-based delay with a configurable
 timer
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

MWAITX can enable a timer and a corresponding timer value specified in
SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides
an upper bound on how long the instruction waits before exiting.

This way, a delay function in the kernel can leverage that MWAITX timer
of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a waiting phase,
diminishing its power consumption. This way, we can save power in
comparison to our default TSC-based delays.

A simple test shows that:

$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
$ sleep 10000s
$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

* TSC-based default delay:      485115 uWatts average power
* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The test
method relies on the support of AMD CPU accumulated power algorithm in
fam15h_power for which patches are forthcoming.

Signed-off-by: Huang Rui <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Borislav Petkov <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Cc: Aaron Lu <[email protected]>
Cc: Andreas Herrmann <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Aravind Gopalakrishnan <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Frédéric Weisbecker <[email protected]>
Cc: Hector Marco-Gisbert <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jacob Shin <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Li <[email protected]>
Cc: x86-ml <[email protected]>
Link: 
http://lkml.kernel.org/r/[email protected]
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <[email protected]>
---
 arch/x86/include/asm/delay.h |  1 +
 arch/x86/kernel/cpu/amd.c    |  4 ++++
 arch/x86/lib/delay.c         | 47 +++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/delay.h b/arch/x86/include/asm/delay.h
index 9b3b4f2754c7..36a760bda462 100644
--- a/arch/x86/include/asm/delay.h
+++ b/arch/x86/include/asm/delay.h
@@ -4,5 +4,6 @@
 #include <asm-generic/delay.h>
 
 void use_tsc_delay(void);
+void use_mwaitx_delay(void);
 
 #endif /* _ASM_X86_DELAY_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 51ad2af84a72..4a70fc6d400a 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -11,6 +11,7 @@
 #include <asm/cpu.h>
 #include <asm/smp.h>
 #include <asm/pci-direct.h>
+#include <asm/delay.h>
 
 #ifdef CONFIG_X86_64
 # include <asm/mmconfig.h>
@@ -506,6 +507,9 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
                /* A random value per boot for bit slice [12:upper_bit) */
                va_align.bits = get_random_int() & va_align.mask;
        }
+
+       if (cpu_has(c, X86_FEATURE_MWAITX))
+               use_mwaitx_delay();
 }
 
 static void early_init_amd(struct cpuinfo_x86 *c)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index 4453d52a143d..e912b2f6d36e 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -20,6 +20,7 @@
 #include <asm/processor.h>
 #include <asm/delay.h>
 #include <asm/timer.h>
+#include <asm/mwait.h>
 
 #ifdef CONFIG_SMP
 # include <asm/smp.h>
@@ -84,6 +85,44 @@ static void delay_tsc(unsigned long __loops)
 }
 
 /*
+ * On some AMD platforms, MWAITX has a configurable 32-bit timer, that
+ * counts with TSC frequency. The input value is the loop of the
+ * counter, it will exit when the timer expires.
+ */
+static void delay_mwaitx(unsigned long __loops)
+{
+       u64 start, end, delay, loops = __loops;
+
+       start = rdtsc_ordered();
+
+       for (;;) {
+               delay = min_t(u64, MWAITX_MAX_LOOPS, loops);
+
+               /*
+                * Use cpu_tss as a cacheline-aligned, seldomly
+                * accessed per-cpu variable as the monitor target.
+                */
+               __monitorx(this_cpu_ptr(&cpu_tss), 0, 0);
+
+               /*
+                * AMD, like Intel, supports the EAX hint and EAX=0xf
+                * means, do not enter any deep C-state and we use it
+                * here in delay() to minimize wakeup latency.
+                */
+               __mwaitx(MWAITX_DISABLE_CSTATES, delay, 
MWAITX_ECX_TIMER_ENABLE);
+
+               end = rdtsc_ordered();
+
+               if (loops <= end - start)
+                       break;
+
+               loops -= end - start;
+
+               start = end;
+       }
+}
+
+/*
  * Since we calibrate only once at boot, this
  * function should be set once at boot and not changed
  */
@@ -91,7 +130,13 @@ static void (*delay_fn)(unsigned long) = delay_loop;
 
 void use_tsc_delay(void)
 {
-       delay_fn = delay_tsc;
+       if (delay_fn == delay_loop)
+               delay_fn = delay_tsc;
+}
+
+void use_mwaitx_delay(void)
+{
+       delay_fn = delay_mwaitx;
 }
 
 int read_current_timer(unsigned long *timer_val)
-- 
2.5.0.rc2.28.g6003e7f

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to