On Mon, 2021-08-09 at 18:35 +0100, Julien Grall wrote:
> On 09/08/2021 17:19, Ahmed, Daniele wrote:
> > Hi all,
> 
> Hi Daniele,
> 
Hello everyone from me as well,

> Thank you for the report!
> 
Indeed. :-)

> 
> The ASSERT() is triggered because the pCPU was already assigned to
> one 
> of the dom0 vCPU. This problem is happening regardless whether there
> is 
> free pCPU.
> 
Right. Can we raise the appropriate log level, so that we can see these
messages:

dprintk(XENLOG_G_INFO, "%d <-- %pdv%d\n", cpu, unit->domain, unit->unit_id);

(and then see a full `xl dmesg`, or even better, a serial console dump,
since we crash! :-P)

> I have added some debugging in sched_set_res():
> 
> diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
> index a870320146ef..2355f531dc13 100644
> --- a/xen/common/sched/private.h
> +++ b/xen/common/sched/private.h
> @@ -150,6 +150,10 @@ static inline void sched_set_res(struct
> sched_unit 
> *unit,
>       unsigned int cpu = cpumask_first(res->cpus);
>       struct vcpu *v;
> 
> +    printk("%s: res->master_cpu %u unit %p %pd %pv\n", __func__,
> +           res->master_cpu, unit, unit->domain, unit->vcpu_list);
> +    WARN();
> +
>       for_each_sched_unit_vcpu ( unit, v )
>       {
>           ASSERT(cpu < nr_cpu_ids);
> 
> This traced the problem to null_unit_migrate():
> 
> (XEN) sched_set_res: res->master_cpu 0 unit ffff830200887f00 d1 d1v0
> (XEN) Xen WARN at private.h:155
> (XEN) ----[ Xen-4.16-unstable  x86_64  debug=y  Tainted:   C   ]----
> (XEN) CPU:    1
> (XEN) RIP:    e008:[<ffff82d04023fd9f>]
> core.c#sched_set_res+0x5b/0xc6
> [...]
> (XEN) Xen call trace:
> (XEN)    [<ffff82d04023fd9f>] R core.c#sched_set_res+0x5b/0xc6
> (XEN)    [<ffff82d040241614>] F sched_init_vcpu+0x3dc/0x5d7
> (XEN)    [<ffff82d04020527d>] F vcpu_create+0xfb/0x37a
> (XEN)    [<ffff82d040238dd9>] F do_domctl+0xac0/0x184a
> (XEN)    [<ffff82d04030d8bc>] F pv_hypercall+0x10d/0x2b8
> (XEN)    [<ffff82d04038829d>] F lstar_enter+0x12d/0x140
> (XEN)
> 
So, it's entirely possible that I'm missing something obvious here, but
what it is that is making you think that we're in null_unit_migrate() ?

Does that come from a different instance of this WARN() ?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to