On 12/10/2022 11:49, Julien Grall wrote:
> Hi Andrew,
>
> On 12/10/2022 11:29, Andrew Cooper wrote:
>> On 12/10/2022 11:01, Julien Grall wrote:
>>> (+ Bertrand & Stefano)
>>>
>>> Hi Henry,
>>>
>>> On 12/10/2022 07:39, Henry Wang wrote:
>>>>> -----Original Message-----
>>>>> Subject: Re: [xen-unstable-smoke test] 173492: regressions - FAIL
>>>>>
>>>>> On 11.10.2022 18:29, osstest service owner wrote:
>>>>>> flight 173492 xen-unstable-smoke real [real]
>>>>>> http://logs.test-lab.xenproject.org/osstest/logs/173492/
>>>>>>
>>>>>> Regressions :-(
>>>>>>
>>>>>> Tests which did not succeed and are blocking,
>>>>>> including tests which could not be run:
>>>>>>    test-arm64-arm64-xl-xsm      14 guest-start              fail
>>>>>> REGR. vs. 173457
>>>>>
>>>>> Parsing config from /etc/xen/debian.guest.osstest.cfg
>>>>> libxl: debug: libxl_create.c:2079:do_domain_create: ao
>>>>> 0xaaaacaccf680:
>>>>> create: how=(nil) callback=(nil) poller=0xaaaacaccefd0
>>>>> libxl: detail: libxl_create.c:661:libxl__domain_make: passthrough:
>>>>> disabled
>>>>> libxl: debug: libxl_arm.c:148:libxl__arch_domain_prepare_config:
>>>>> Configure
>>>>> the domain
>>>>> libxl: debug: libxl_arm.c:151:libxl__arch_domain_prepare_config:  -
>>>>> Allocate
>>>>> 0 SPIs
>>>>> libxl: error: libxl_create.c:709:libxl__domain_make: domain creation
>>>>> fail: No
>>>>> such file or directory
>>>
>>> So this is -ENOENT which could be returned by the P2M is it can't
>>> allocate a page table (see p2m_set_entry()).
>>>
>>>>> libxl: error: libxl_create.c:1294:initiate_domain_create: cannot
>>>>> make domain:
>>>>> -3
>>>>>
>>>>> Later flights don't fail here anymore, though.
>>>>>
>>>>>>    test-armhf-armhf-xl          14 guest-start              fail
>>>>>> REGR. vs. 173457
>>>>>
>>>>> Similar log contents here, but later flights continue to fail the
>>>>> same way.
>>>>>
>>>>> I'm afraid I can't draw conclusions from this; I haven't been able
>>>>> to spot
>>>>> anything helpful in the hypervisor logs. My best guess right now is
>>>>> the use
>>>>> of some uninitialized memory, which just happened to go fine in the
>>>>> later
>>>>> flights for 64-bit.
>>>
>>> It looks like the smoke flight failed on laxton0 but passed on
>>> rochester{0, 1}. The former is using GICv2 whilst the latter are using
>>> GICv3.
>>>
>>> In the case of GICv2, we will create a P2M mapping when the domain is
>>> created. This is not necessary in the GICv3.
>>>
>>> IIRC the P2M pool is only populated later on (we don't add a few pages
>>> like on x86). So I am guessing this is why we are seen failure.
>>>
>>> If that's correct, then this is a complete oversight from me (I
>>> haven't done any GICv2 testing) while reviewing the series.
>>>
>>> The easy way to solve it would be to add a few pages in the pool when
>>> the domain is created. I don't like it, but I think there other
>>> possible solutions would require more work as we would need to delay
>>> the mappings.
>>
>> Honestly, I've considered doing this on x86 too.
>
> AFAICT, this is already the case for HAP (see call to
> hap_set_allocation() in hap_enable()). 256 pages will be pre-allocated.

Right, but it's asymmetric with shadow.  This wants fixing and simplifying.

>
>>
>> There are several things which want allocating in domain_create(), but
>> are deferred to max_vcpus() because they require the P2M having a
>> non-zero allocation.  This in turn means we've got a load of checks in
>> paths where we'd ideally not have them.
>>
>> We already have a calculation of the absolutely minimum we will ever
>> permit the p2m pool to be.  IMO we ought to allocate this minimum size
>> in domain_create().
>
> It depends on the number. At the moment domain_create() is not
> preemptible, so we don't want to allocate too many pages (I think even
> 256 pages could be risky on some Arm platform).
>
> Maybe the solution is to have domain_create() preemptible. But it is
> not something that could be done in the 4.17 time frame.

domain_create() can't be pre-emptible in its current form, because it
depends on "atomically" taking the domid from not existing to existing. 
Specifically, until the hypercall completes, other hypercalls can't find
a struct domain* for the domid.

This is necessary, because we guarantee that when you can look up a
domain by domid, e.g. the predicates work on it, and d->max_vcpus is
nonzero, etc.

In some future where the error paths have been made idempotent and we
have a clean split between teardown and destroy, we probably can alter
the existing creation path to do a more basic initial setup (which can
be cleaned up by the destroy logic), then insert the domain into dom
hashtable and automatically continue into a different subop and perform
more long-running setup.

But yeah - absolutely definitely not 4.17 content.

~Andrew

Reply via email to