[arm] GCC validation: preferred way of running the testsuite?

Christophe Lyon via Gcc Mon, 11 May 2020 09:44:19 -0700

Hi,


As you may know, I've been running validations of GCC trunk in many
configurations for Arm and Aarch64.


I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and
ACLE tests because in several configurations I see 300-400 FAILs
mainly in these areas, because of “testisms”. The goal is to avoid
wasting time over the same failure reports when checking what needs
fixing. I thought this would be quick & easy, but this is tedious
because of the numerous combinations of options and configurations
available on Arm.


Sorry for the very long email, it’s hard to describe and summarize,
but I'd like to try nonetheless, hoping that we can make testing
easier/more efficient :-), because most of the time the problems I
found are with the tests rather than real compiler bugs, so I think
it's a bit of wasted time.


Here is a list of problems, starting with the tricky dependencies
around -mfloat-abi=XXX:

* Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with
glibc), or one can decide not to build with both hard and soft FP
multilibs. This generally becomes a problem when including stdint.h
(used by arm_neon.h, arm_acle.h, …), leading to a compiler error for
lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to
the picture, it becomes quite complex (eg -mfloat-abi=hard is not
supported on thumb-1).


Consider mytest.c that does not depend on any include file and has:
/* { dg-options "-mfloat-abi=hard" } */

If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon,
with ‘make check’, the test PASSes.
With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the
test FAILs:
sorry, unimplemented: Thumb-1 hard-float VFP ABI


If I add
/* { dg-require-effective-target arm_hard_ok } */
‘make check’ with --target-board=-march=armv5t/-mthumb is now
UNSUPPORTED (which is OK), but
plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects
that we lack the -mfloat-abi=hard multilib. So we lose a PASS.

If I configure GCC for arm-linux-gnueabihf, then:
‘make check’ PASSes
‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs
and with
/* { dg-require-effective-target arm_hard_ok } */
‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and
plain ‘make check’ PASSes

So it seems the best option is to add
/* { dg-require-effective-target arm_hard_ok } */
although it makes the test UNSUPPORTED by arm-linux-gnueabi even in
cases where it could PASS.

Is there consensus that this is the right way?



* In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and
-march=XXX are independent in general, meaning if you query for
-mfloat-abi=hard support, it will do that in the absence of any
-march=XXX that the testcase may also be using. So, if GCC is
configured with its default cpu/fpu, -mfloat-abi=hard will be rejected
for lack of an fpu on the default cpu, but if GCC is configured with a
suitable cpu/fpu pair, -mfloat-abi=hard will be accepted.

I faced this problem when I tried to “fix” the order in which we try options in
Arm_v8_2a_bf16_neon_ok. (see
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html)

I faced similar problems while working on a patch of mine about a bug
with IRQ handlers which has different behaviour depending on the FP
ABI used: I have the feeling that I spend too much time writing the
tests to the detriment of the patch itself...

I also noticed that Richard Sandiford probably faced similar issues
with his recent fix for "no_unique_address", where he finally added
arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU +
float-abi=hard at the same time.

Maybe we could decide on a consistent and simpler way of checking such things?


* A metric for this complexity could be the number of arm
effective-targets, a quick and not-fully accurate grep | sed | sort |
uniq -c | sort -n on target-supports.exp ends with:
     9 mips
     16 aarch64
     21 powerpc
     97 vect
    106 arm
(does not count all the effective-targets generated by tcl code, eg
arm_arch_FUNC_ok)

This probably explains why it’s hard to get test directives right :-)

I’ve not thought about how we could reduce that number….



* Finally, I’m wondering about the most appropriate way of configuring
GCC and running the tests.

So far, for most of the configurations I'm testing, I use different
--with-cpu/--with-fpu/--with-mode configure flags for each toolchain
configuration I’m testing and rarely override the flags at testing
time. I also disable multilibs to save build time and (scratch) disk
space. (See 
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html
for the current list, each line corresponds to a clean build + make
check job -- so there are 15 different toolchain configs for
arm-linux-gnueabihf for instance)

However, I think this is may not be appropriate at least for the
arm-eabi toolchains, because I suspect the vendors who support several
SoCs generally ship one binary toolchain built with the default
cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile),
and the associated IDE adds the right -mcpu/-mfpu flags (see
arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that
the "appropriate" way of testing such a toolchain is to build it with
the default settings and appropriate multilibs and add the needed
-mcpu/-mfpu variants at 'make check' time.

I would still build one toolchain per configuration I want to test and
not use runtest’s capability to iterate over several combinations:
this way I can run the tests in parallel and reduce the total time
needed to get the results.

One can compare the results of both options with the two lines with
cortex-m33 in the above table (target arm-none-eabi).

In the first one, GCC is configured for cortex-m33, and tests executed
via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space
14GB)

In the 2nd line, GCC is configured with the default cpu/fpu, multilibs
enabled and I use test flags suitable for cortex-m33: now only 73
failures for gcc. (duration ~3h15, disk space 26GB). Note that there
are more failures for g++ and libstdc++ than for the previous line, I
haven’t fully checked why -- for libstdc++ there are spurious
-march=armv8-m.main+fp flags in the log. So this is not the magic
bullet.


Unfortunately, this means every test with arm_hard_ok effective target
would be unsupported (lack of fpu on default cpu) whatever the
validation cflags. The increased build time (many multilibs built for
nothing) will also reduce the validation bandwidth (I hope the
increased scratch disk space will not be a problem with my IT…)



OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors
probably prefer to tune them for their preferred default CPU. For
instance I have an arm board running Ubuntu with gcc-5.4 configured
--with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard
--with-mode=thumb.

If this is right, it would mean I should keep the configurations I
currently use for arm-linux* (no multilib, rely on default cpu/fpu).

** Regarding the flags used for testing, I’m also wondering what’s the
most appropriate: -mcpu or -march. Both have probably pros and cons?

In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I
described a problem where it seems that one expects the tests to run
with -march=XXX.

Another log of mine has an effective-target helper compiled with:
-mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp
-mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb
which produces this error:
cc1: warning: switch '-mcpu=cortex-m33' conflicts with
'-march=armv8.1-m.main' switch
which looks suspicious: running the tests in multiple ways surely
helps uncovering bugs….


In summary, I’d like to gather opinions on:
* appropriate usage of dg-require-effective-target arm_hard_ok
* how to improve float-abi support detection in combination with
architecture level
* hopefully consensus on choosing how to configure the toolchain and
run the tests. I’m suggesting default config + multilibs +
runtest-flags for arm-eabi and a selection of default cpu/fpu + less
runtest-flags for arm-linux*.


Thanks for reading that far :-)


Christophe

[arm] GCC validation: preferred way of running the testsuite?

Reply via email to