On 22/10/2019 12:01, Jürgen Groß wrote:
> On 22.10.19 12:52, Roger Pau Monné wrote:
>> On Tue, Oct 22, 2019 at 11:27:41AM +0200, Jürgen Groß wrote:
>>> On 21.10.19 11:51, Sergey Dyasli wrote:
>>>> Hello,
>>>>
>>>> While testing pv-shim from a snapshot of staging 4.13 branch (with
>>>> core-
>>>> scheduling patches applied), some sort of scheduling issues were
>>>> uncovered
>>>> which usually leads to a guest lockup (sometimes with soft lockup
>>>> messages
>>>> from Linux kernel).
>>>>
>>>> This happens more frequently on SandyBridge CPUs. After enabling
>>>> CONFIG_DEBUG in pv-shim, the following assertions failed:
>>>>
>>>> Null scheduler:
>>>>
>>>>       Assertion 'lock ==
>>>> get_sched_res(i->res->master_cpu)->schedule_lock' failed at
>>>> ...are/xen-dir/xen-root/xen/include/xen/sched-if.h:278
>>>>       (full crash log: https://paste.debian.net/1108861/ )
>>>>
>>>> Credit1 scheduler:
>>>>
>>>>       Assertion 'cpumask_cycle(cpu, unit->cpu_hard_affinity) ==
>>>> cpu' failed at sched_credit.c:383
>>>>       (full crash log: https://paste.debian.net/1108862/ )
>>>>
>>>> I'm currently investigation those, but would appreciate any help or
>>>> suggestions.
>>>
>>> And now a more sane patch to try.
>>>
>>>
>>> Juergen
>>>
>>
>>>  From 205b7622b84bc678f8a0d6ac121dff14439fe331 Mon Sep 17 00:00:00 2001
>>> From: Juergen Gross <[email protected]>
>>> To: [email protected]
>>> Cc: Jan Beulich <[email protected]>
>>> Cc: Andrew Cooper <[email protected]>
>>> Cc: Wei Liu <[email protected]>
>>> Cc: "Roger Pau Monné" <[email protected]>
>>> Date: Tue, 22 Oct 2019 11:14:08 +0200
>>> Subject: [PATCH] xen/pvhsim: fix cpu onlining
>>>
>>> Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
>>> the initial processor for all pv-shim vcpus will be 0, as no other cpus
>>> are online when the vcpus are created. Before that commit the vcpus
>>> would have processors set not being online yet, which worked just by
>>> chance.
>>>
>>> When the pv-shim vcpu becomes active it will have a hard affinity
>>> not matching its initial processor assignment leading to failing
>>> ASSERT()s or other problems depending on the selected scheduler.
>>
>> I'm slightly lost here, who has set this hard affinity on the pvshim
>> vCPUs?
>
> That is done in sched_setup_dom0_vcpus().
>
>>
>>> Fix that by redoing the affinity setting after onlining the cpu but
>>> before taking the vcpu up.
>>
>> The change seems fine to me, but I don't understand why the lack of
>> this can cause asserts to trigger, as reported by Sergey. I also
>> wonder why a change to pin vCPU#0 to pCPU#0 is not required, because
>> pv_shim_cpu_up is only used for APs.
>
> When vcpu 0 is being created pcpu 0 is online already. So the affinity
> set in sched_setup_dom0_vcpus() is fine in that case.
>
>> I would expect that pvshim guest vCPUs have no hard affinity ATM, and
>> that when a pCPU (from the shim PoV) is brought online it will be
>> added to the pool of available pCPU for the shim to schedule vCPUs
>> on.
>
> That expectation is wrong. All vcpus are pinned to their respective
> pcpus.

The goal for Shim was always to use the NULL scheduler and always have
vcpu == pcpu.  The only reason we use credit is because NULL (still!)
doesn't work, and bodge the scheduling using hard affinity.

~Andrew

_______________________________________________
Xen-devel mailing list
[email protected]
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to