Hi Peter/Ingo and all,
With the advent of more cores and heterogeneous architectures, the scheduler is
required to be more complex (power efficiency) and diverse (big.little). For
the scheduler to address that challenge as a whole, it is costly but not
necessary. This proposal argues that the scheduler be spitted into two parts:
top half (task scheduling) and bottom half (load balance). Let the bottom half
take charge of the incoming requirements.
The two halves are rather orthogonal in functionality. The task scheduling (top
half) seeks for *ONE* CPU to execute running tasks fairly (priority included),
while the load balance (bottom half) aims for *ALL* CPUs to maximize the
throughput of the computing power. The goal of task scheduling is pretty unique
and clear, and CFS and RT in that part are exactly approaching the goal. The
load balance, however, is constrained to meet more goals, to name a few,
performance (throughput/responsiveness), power consumption, architecture
differences, etc. Those things are often hard to achieve because they may
conflict and are difficult to estimate and plan. So, shall we declare the
independence of the two, give them freedom to pursue their own "happiness".
We take an incremental development method. As a starting point, we did three
things (but did not change one single line of real-work code):
1) Remove load balance from fair.c into load_balance.c (~3000
lines of codes). As a result, fair.c/rt.c and load_balance.c have very little
intersection.
2) Define struct sched_lb_class that consists of the following
members to umbrella the load balance entry points.
a. const struct sched_lb_class *next;
b. int (*fork_balance) (struct task_struct *p, int
sd_flags, int wake_flags);
c. int (*exec_balance) (struct task_struct *p, int
sd_flags, int wake_flags);
d. int (*wakeup_balance) (struct task_struct *p, int
sd_flags, int wake_flags);
e. void (*idle_balance) (int this_cpu, struct rq *this_rq);
f. void (*periodic_rebalance) (int cpu, enum cpu_idle_type
idle);
g. void (*nohz_idle_balance) (int this_cpu, enum
cpu_idle_type idle);
h. void (*start_periodic_balance) (struct rq *rq, int cpu);
i. void (*check_nohz_idle_balance) (struct rq *rq, int
cpu);
3) Insert another layer of indirection to wrap the implemented
functions in sched_lb_class. Implement a default load balance class that is
just the previous load balance.
The next to do is to continue redesigning and refactoring to make life easier
toward more powerful and diverse load balance. And more importantly, this RFC
solicits a discussion to get early feedback on the big proposed change.
Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/