Hi Koichiro,
On 30.11.25 16:30, Koichiro Den wrote:
On Thu, Nov 27, 2025 at 04:55:17PM +0200, Grygorii S wrote:
Hi All,
On 21.06.25 17:14, Koichiro Den wrote:
When a running unit is about to be scheduled out due to a competing unit
with the highest remaining credit, the residual credit of the previous
unit is currently ignored in csched2_runtime() because it hasn't yet
been reinserted into the runqueue.
As a result, two equally weighted, busy units can often each be granted
almost the maximum possible runtime (i.e. consuming CSCHED2_CREDIT_INIT
in one shot) when only those two are active. In broad strokes two units
switch back and forth every 10ms (CSCHED2_MAX_TIMER). In contrast, when
more than two busy units are competing, such coarse runtime allocations
are rarely seen, since at least one active unit remains in the runqueue.
To ensure consistent behavior, have csched2_runtime() take into account
the previous unit's latest credit when it still can/wants to run.
Signed-off-by: Koichiro Den <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
---
xen/common/sched/credit2.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
We observe regression on ARM64 with this patch.
commit ae648e9f8013 ("xen/credit2: factor in previous active unit's credit in
csched2_runtime()")
general observation:
This commit causes Linux guest boot time increase >5 times for some of our
the credit2
specific tests.
Reverting it makes issue gone.
- normal log
(XEN) DOM1: [ 6.496166] io scheduler bfq registered
...
(XEN) DOM1: [ 9.845108] Freeing unused kernel memory: 9216K
(XEN) DOM1: [ 9.874792] Run /init as init process
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=16800131328
- failed log
(XEN) DOM1: [ 37.281776] io scheduler bfq registered
(XEN) DOM1: [ 61.856512] EINJ: ACPI disabled.
test: timed out
Run Details:
Platform: ARM64 (Device Tree)
Execution platform: qemu 6.0 (2 pCPU, 2G)
Boot: dom0less, 1 domain (2 vCPU)
Command line: "console=dtuart guest_loglvl=debug conswitch=ax"
Dom0less cfg:
chosen {
xen,xen-bootargs = "console=dtuart guest_loglvl=debug conswitch=ax";
#size-cells = <0x00000002>;
#address-cells = <0x00000002>;
stdout-path = "/pl011@9000000";
kaslr-seed = <0x5a7b5649 0x9122e194>;
cpupool_0 {
cpupool-sched = "credit2";
cpupool-cpus = <0x00008001>;
compatible = "xen,cpupool";
phandle = <0xfffffffe>;
};
domU0 {
domain-cpupool = <0xfffffffe>;
vpl011;
cpus = <0x00000002>;
memory = <0x00000000 0x00040000>;
#size-cells = <0x00000002>;
#address-cells = <0x00000002>;
compatible = "xen,domain";
module@42E00000 {
reg = <0x00000000 0x42e00000 0x00000000 0x000f1160>;
compatible = "multiboot,ramdisk", "multiboot,module";
};
module@40400000 {
bootargs = "console=ttyAMA0";
reg = <0x00000000 0x40400000 0x00000000 0x02920000>;
compatible = "multiboot,kernel", "multiboot,module";
};
};
};
Investigation:
It was narrowed down to a specific configuration with cpupool assigned to the
domain (100% reproducible):
Host has 2 pCPU
Domain has 2 vCPU
cpupool_0 has 1 pCPU (cpu@1 credit2)
domain <- cpupool_0
if Domain is assigned 1 vCPU - no issues.
if cpupool_0 is assigned 2 pCPU - no issues (seems slower a bit, but it is
on a error margin level)
I'd be appreciated for any help with this (or revert :().
Thank you for the detailed report. Could you please try increasing the
ratelimit_us (the -r/--ratelimit_us option), for example to 5000 or 10000
microseconds, and see whether the long boot time issue disappears?
That would help determine whether the previous behaviour (before the patch) had
simply masked the effect of the default 1ms rate limit in your setup.
I've tried it. Boot time is improved, but it's still slower.
(XEN) Command line: console=dtuart guest_loglvl=debug conswitch=ax
sched_ratelimit_us=5000
(XEN) DOM1: [ 37.903192] Freeing unused kernel memory: 9216K
(XEN) DOM1: [ 37.970645] Run /init as init process
I've attached dump_runq below FYI.
Note. This is dom0less boot, cpupools/domains are created at boot time.
The toolstack is not used.
And I've not tried (and it will be hard for me to try) if the issue reproducible
when cpupools/domains are created by toolstack with above cfg.
I can try run with debug changes if you have any.
In other words, after the patch merged, you may need to set -r/--ratelimit_us
explicitly to some appropriate value, which is larger than 1ms.
Unfortunately, "after the patch merged, may need to set -r/--ratelimit_us
explicitly"
is not going to work :( (at least not as long term solution) as this is safety
test suit,
so any deviations from default Xen settings which are not part of particular
Test case
need to be justified.
That said, this change touches long-standing credit2 behaviour, and we
probably should've discussed backward-compatibility more carefully. I'm
completely fine with reverting it if maintainers think that is the best
choice for now. (To be honest, I hadn't even realised that this had been
merged until receiving your email, since it only had a single Reviewed-by.)
--
Best regards,
-grygorii
==== dump_runq() "r" =========
(XEN) DOM1: [ 37.970645] Run /init as init process
(XEN) sched_smt_power_savings: disabled
(XEN) NOW=46147558848
(XEN) Online Cpus: 0-1
(XEN) Cpupool 0:
(XEN) Cpus: 0
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Active queues: 1
(XEN) default-weight = 256
(XEN) Runqueue 0:
(XEN) ncpus = 1
(XEN) cpus = 0
(XEN) max_weight = 1
(XEN) pick_bias = 0
(XEN) instload = 0
(XEN) aveload = 0 (~0%)
(XEN) idlers: 0
(XEN) tickled: 0
(XEN) fully idle cores: 0
(XEN) Domain info:
(XEN) Runqueue 0:
(XEN) CPU[00] runq=0, sibling={0}, core={0-1}
(XEN) RUNQ:
(XEN) CPUs info:
(XEN) CPU[00] current=d[IDLE]v0, curr=d[IDLE]v0, prev=NULL
(XEN) Cpupool 1:
(XEN) Cpus: 1
(XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) Scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Active queues: 1
(XEN) default-weight = 256
(XEN) Runqueue 0:
(XEN) ncpus = 1
(XEN) cpus = 1
(XEN) max_weight = 256
(XEN) pick_bias = 1
(XEN) instload = 1
(XEN) aveload = 413852 (~157%)
(XEN) idlers: 0
(XEN) tickled: 0
(XEN) fully idle cores: 0
(XEN) Domain info:
(XEN) Domain: 1 w 256 c 0 v 2
(XEN) 1: [1.0] flags=2 cpu=1 credit=6994336 [w=256] load=238475 (~90%)
(XEN) 2: [1.1] flags=2 cpu=1 credit=4766960 [w=256] load=176230 (~67%)
(XEN) Runqueue 0:
(XEN) CPU[01] runq=0, sibling={1}, core={0-1}
(XEN) run: [1.1] flags=2 cpu=1 credit=4766960 [w=256] load=176230 (~67%)
(XEN) RUNQ:
(XEN) 0: [1.0] flags=0 cpu=1 credit=18096 [w=256] load=238475 (~90%)
(XEN) CPUs info:
(XEN) CPU[01] current=d1v0, curr=d1v0, prev=NULL