On 22/10/2019 12:01, Jürgen Groß wrote: > On 22.10.19 12:52, Roger Pau Monné wrote: >> On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote: >>> On 21.10.19 11:51, Sergey Dyasli wrote: >>>> Hello, >>>> >>>> While testing pv-shim from a snapshot of staging 4.13 branch (with >>>> core- >>>> scheduling patches applied), some sort of scheduling issues were >>>> uncovered >>>> which usually leads to a guest lockup (sometimes with soft lockup >>>> messages >>>> from Linux kernel). >>>> >>>> This happens more frequently on SandyBridge CPUs. After enabling >>>> CONFIG_DEBUG in pv-shim, the following assertions failed: >>>> >>>> Null scheduler: >>>> >>>> Assertion 'lock == >>>> get_sched_res(i->res->master_cpu)->schedule_lock' failed at >>>> ...are/xen-dir/xen-root/xen/include/xen/sched-if.h:278 >>>> (full crash log: https://paste.debian.net/1108861/ ) >>>> >>>> Credit1 scheduler: >>>> >>>> Assertion 'cpumask_cycle(cpu, unit->cpu_hard_affinity) == >>>> cpu' failed at sched_credit.c:383 >>>> (full crash log: https://paste.debian.net/1108862/ ) >>>> >>>> I'm currently investigation those, but would appreciate any help or >>>> suggestions. >>> >>> And now a more sane patch to try. >>> >>> >>> Juergen >>> >> >>> From 205b7622b84bc678f8a0d6ac121dff14439fe331 Mon Sep 17 00:00:00 2001 >>> From: Juergen Gross <[email protected]> >>> To: [email protected] >>> Cc: Jan Beulich <[email protected]> >>> Cc: Andrew Cooper <[email protected]> >>> Cc: Wei Liu <[email protected]> >>> Cc: "Roger Pau Monné" <[email protected]> >>> Date: Tue, 22 Oct 2019 11:14:08 +0200 >>> Subject: [PATCH] xen/pvhsim: fix cpu onlining >>> >>> Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor") >>> the initial processor for all pv-shim vcpus will be 0, as no other cpus >>> are online when the vcpus are created. Before that commit the vcpus >>> would have processors set not being online yet, which worked just by >>> chance. >>> >>> When the pv-shim vcpu becomes active it will have a hard affinity >>> not matching its initial processor assignment leading to failing >>> ASSERT()s or other problems depending on the selected scheduler. >> >> I'm slightly lost here, who has set this hard affinity on the pvshim >> vCPUs? > > That is done in sched_setup_dom0_vcpus(). > >> >>> Fix that by redoing the affinity setting after onlining the cpu but >>> before taking the vcpu up. >> >> The change seems fine to me, but I don't understand why the lack of >> this can cause asserts to trigger, as reported by Sergey. I also >> wonder why a change to pin vCPU#0 to pCPU#0 is not required, because >> pv_shim_cpu_up is only used for APs. > > When vcpu 0 is being created pcpu 0 is online already. So the affinity > set in sched_setup_dom0_vcpus() is fine in that case. > >> I would expect that pvshim guest vCPUs have no hard affinity ATM, and >> that when a pCPU (from the shim PoV) is brought online it will be >> added to the pool of available pCPU for the shim to schedule vCPUs >> on. > > That expectation is wrong. All vcpus are pinned to their respective > pcpus.
The goal for Shim was always to use the NULL scheduler and always have vcpu == pcpu. The only reason we use credit is because NULL (still!) doesn't work, and bodge the scheduling using hard affinity. ~Andrew _______________________________________________ Xen-devel mailing list [email protected] https://lists.xenproject.org/mailman/listinfo/xen-devel
