From: Wanpeng Li <[email protected]>

Commit:

        57430218317e ("sched/cputime: Count actually elapsed irq & softirq 
time")

... triggered a regression:

An i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
cpu hog processes(for loop) running in the guest, I hot-unplug the pCPUs 
on host one by one until there is only one left, then observe the top in 
guest, there are 100% st for cpu0(housekeeping), and 75% st for other cpus
(nohz full mode). However, w/o this commit, 75% for all the four cpus.

As Rik and Paolo pointed out:

| It turns out that if a guest misses several timer ticks in a row, they
| will simply get lost.
|
| That means the functions calling steal_account_process_time may not know 
| how much CPU time has passed since the last time it was called, but 
| steal_account_process_time will get a good idea on how much time the host 
| spent running something else.

This patch fix it by removing the max cputime limit for tick based sampling, 
and keep the limit for vtime in order that steal_account_process_time() will 
not attempt to remove more than the limit.

Suggested-by: Rik van Riel <[email protected]> 
Suggsted-by: Paolo Bonzini <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krcmar <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
 kernel/sched/cputime.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 9858266..a119304 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -263,6 +263,11 @@ void account_idle_time(cputime_t cputime)
                cpustat[CPUTIME_IDLE] += (__force u64) cputime;
 }
 
+/*
+ * After a host system is overloaded, the missed clock ticks are not
+ * redelivered to guest later. Due to that, this function may on
+ * occasion account more time than the calling functions think elapsed.
+ */
 static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
 {
 #ifdef CONFIG_PARAVIRT
@@ -371,7 +376,7 @@ static void irqtime_account_process_tick(struct task_struct 
*p, int user_tick,
         * idle, or potentially user or system time. Due to rounding,
         * other time can exceed ticks occasionally.
         */
-       other = account_other_time(cputime);
+       other = account_other_time(ULONG_MAX);
        if (other >= cputime)
                return;
        cputime -= other;
@@ -486,7 +491,7 @@ void account_process_tick(struct task_struct *p, int 
user_tick)
        }
 
        cputime = cputime_one_jiffy;
-       steal = steal_account_process_time(cputime);
+       steal = steal_account_process_time(ULONG_MAX);
 
        if (steal >= cputime)
                return;
@@ -516,7 +521,7 @@ void account_idle_ticks(unsigned long ticks)
        }
 
        cputime = jiffies_to_cputime(ticks);
-       steal = steal_account_process_time(cputime);
+       steal = steal_account_process_time(ULONG_MAX);
 
        if (steal >= cputime)
                return;
-- 
1.9.1

Reply via email to