Hi Andrew,
On 12/10/2022 11:29, Andrew Cooper wrote:
On 12/10/2022 11:01, Julien Grall wrote:
(+ Bertrand & Stefano)
Hi Henry,
On 12/10/2022 07:39, Henry Wang wrote:
-----Original Message-----
Subject: Re: [xen-unstable-smoke test] 173492: regressions - FAIL
On 11.10.2022 18:29, osstest service owner wrote:
flight 173492 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173492/
Regressions :-(
Tests which did not succeed and are blocking,
including tests which could not be run:
test-arm64-arm64-xl-xsm 14 guest-start fail
REGR. vs. 173457
Parsing config from /etc/xen/debian.guest.osstest.cfg
libxl: debug: libxl_create.c:2079:do_domain_create: ao 0xaaaacaccf680:
create: how=(nil) callback=(nil) poller=0xaaaacaccefd0
libxl: detail: libxl_create.c:661:libxl__domain_make: passthrough:
disabled
libxl: debug: libxl_arm.c:148:libxl__arch_domain_prepare_config:
Configure
the domain
libxl: debug: libxl_arm.c:151:libxl__arch_domain_prepare_config: -
Allocate
0 SPIs
libxl: error: libxl_create.c:709:libxl__domain_make: domain creation
fail: No
such file or directory
So this is -ENOENT which could be returned by the P2M is it can't
allocate a page table (see p2m_set_entry()).
libxl: error: libxl_create.c:1294:initiate_domain_create: cannot
make domain:
-3
Later flights don't fail here anymore, though.
test-armhf-armhf-xl 14 guest-start fail
REGR. vs. 173457
Similar log contents here, but later flights continue to fail the
same way.
I'm afraid I can't draw conclusions from this; I haven't been able
to spot
anything helpful in the hypervisor logs. My best guess right now is
the use
of some uninitialized memory, which just happened to go fine in the
later
flights for 64-bit.
It looks like the smoke flight failed on laxton0 but passed on
rochester{0, 1}. The former is using GICv2 whilst the latter are using
GICv3.
In the case of GICv2, we will create a P2M mapping when the domain is
created. This is not necessary in the GICv3.
IIRC the P2M pool is only populated later on (we don't add a few pages
like on x86). So I am guessing this is why we are seen failure.
If that's correct, then this is a complete oversight from me (I
haven't done any GICv2 testing) while reviewing the series.
The easy way to solve it would be to add a few pages in the pool when
the domain is created. I don't like it, but I think there other
possible solutions would require more work as we would need to delay
the mappings.
Honestly, I've considered doing this on x86 too.
AFAICT, this is already the case for HAP (see call to
hap_set_allocation() in hap_enable()). 256 pages will be pre-allocated.
There are several things which want allocating in domain_create(), but
are deferred to max_vcpus() because they require the P2M having a
non-zero allocation. This in turn means we've got a load of checks in
paths where we'd ideally not have them.
We already have a calculation of the absolutely minimum we will ever
permit the p2m pool to be. IMO we ought to allocate this minimum size
in domain_create().
It depends on the number. At the moment domain_create() is not
preemptible, so we don't want to allocate too many pages (I think even
256 pages could be risky on some Arm platform).
Maybe the solution is to have domain_create() preemptible. But it is not
something that could be done in the 4.17 time frame.
Cheers,
--
Julien Grall