Hi,
As you may know, I've been running validations of GCC trunk in many configurations for Arm and Aarch64. I was recently trying to make some cleanup in the new Bfloat16, MVE, CDE, and ACLE tests because in several configurations I see 300-400 FAILs mainly in these areas, because of “testisms”. The goal is to avoid wasting time over the same failure reports when checking what needs fixing. I thought this would be quick & easy, but this is tedious because of the numerous combinations of options and configurations available on Arm. Sorry for the very long email, it’s hard to describe and summarize, but I'd like to try nonetheless, hoping that we can make testing easier/more efficient :-), because most of the time the problems I found are with the tests rather than real compiler bugs, so I think it's a bit of wasted time. Here is a list of problems, starting with the tricky dependencies around -mfloat-abi=XXX: * Some targets do not support multilibs (eg arm-linux-gnueabi[hf] with glibc), or one can decide not to build with both hard and soft FP multilibs. This generally becomes a problem when including stdint.h (used by arm_neon.h, arm_acle.h, …), leading to a compiler error for lack of gnu/stub*.h for the missing float-abi. If you add -mthumb to the picture, it becomes quite complex (eg -mfloat-abi=hard is not supported on thumb-1). Consider mytest.c that does not depend on any include file and has: /* { dg-options "-mfloat-abi=hard" } */ If GCC is configured for arm-linux-gnueabi --with-cpu=cortex-a9 --with-fpu=neon, with ‘make check’, the test PASSes. With ‘make check’ with --target-board=-march=armv5t/-mthumb, then the test FAILs: sorry, unimplemented: Thumb-1 hard-float VFP ABI If I add /* { dg-require-effective-target arm_hard_ok } */ ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED (which is OK), but plain ‘make check’ is now also UNSUPPORTED because arm_hard_ok detects that we lack the -mfloat-abi=hard multilib. So we lose a PASS. If I configure GCC for arm-linux-gnueabihf, then: ‘make check’ PASSes ‘make check’ with --target-board=-march=armv5t/-mthumb, FAILs and with /* { dg-require-effective-target arm_hard_ok } */ ‘make check’ with --target-board=-march=armv5t/-mthumb is now UNSUPPORTED and plain ‘make check’ PASSes So it seems the best option is to add /* { dg-require-effective-target arm_hard_ok } */ although it makes the test UNSUPPORTED by arm-linux-gnueabi even in cases where it could PASS. Is there consensus that this is the right way? * In GCC DejaGnu helpers, the queries for -mfloat-abi=hard and -march=XXX are independent in general, meaning if you query for -mfloat-abi=hard support, it will do that in the absence of any -march=XXX that the testcase may also be using. So, if GCC is configured with its default cpu/fpu, -mfloat-abi=hard will be rejected for lack of an fpu on the default cpu, but if GCC is configured with a suitable cpu/fpu pair, -mfloat-abi=hard will be accepted. I faced this problem when I tried to “fix” the order in which we try options in Arm_v8_2a_bf16_neon_ok. (see https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544654.html) I faced similar problems while working on a patch of mine about a bug with IRQ handlers which has different behaviour depending on the FP ABI used: I have the feeling that I spend too much time writing the tests to the detriment of the patch itself... I also noticed that Richard Sandiford probably faced similar issues with his recent fix for "no_unique_address", where he finally added arm_arch_v8a_hard_ok to check arm8v-a CPU + neon-fp-armv8 FPU + float-abi=hard at the same time. Maybe we could decide on a consistent and simpler way of checking such things? * A metric for this complexity could be the number of arm effective-targets, a quick and not-fully accurate grep | sed | sort | uniq -c | sort -n on target-supports.exp ends with: 9 mips 16 aarch64 21 powerpc 97 vect 106 arm (does not count all the effective-targets generated by tcl code, eg arm_arch_FUNC_ok) This probably explains why it’s hard to get test directives right :-) I’ve not thought about how we could reduce that number…. * Finally, I’m wondering about the most appropriate way of configuring GCC and running the tests. So far, for most of the configurations I'm testing, I use different --with-cpu/--with-fpu/--with-mode configure flags for each toolchain configuration I’m testing and rarely override the flags at testing time. I also disable multilibs to save build time and (scratch) disk space. (See https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/0latest/report-build-info.html for the current list, each line corresponds to a clean build + make check job -- so there are 15 different toolchain configs for arm-linux-gnueabihf for instance) However, I think this is may not be appropriate at least for the arm-eabi toolchains, because I suspect the vendors who support several SoCs generally ship one binary toolchain built with the default cpu/fpu/mode and the appropriate multilibs (aprofile or rmprofile), and the associated IDE adds the right -mcpu/-mfpu flags (see arm-embedded toolchain, ST CubeMX for stm32). So it seems to me that the "appropriate" way of testing such a toolchain is to build it with the default settings and appropriate multilibs and add the needed -mcpu/-mfpu variants at 'make check' time. I would still build one toolchain per configuration I want to test and not use runtest’s capability to iterate over several combinations: this way I can run the tests in parallel and reduce the total time needed to get the results. One can compare the results of both options with the two lines with cortex-m33 in the above table (target arm-none-eabi). In the first one, GCC is configured for cortex-m33, and tests executed via plain ‘make check’: 401 failures in gcc. (duration ~2h, disk space 14GB) In the 2nd line, GCC is configured with the default cpu/fpu, multilibs enabled and I use test flags suitable for cortex-m33: now only 73 failures for gcc. (duration ~3h15, disk space 26GB). Note that there are more failures for g++ and libstdc++ than for the previous line, I haven’t fully checked why -- for libstdc++ there are spurious -march=armv8-m.main+fp flags in the log. So this is not the magic bullet. Unfortunately, this means every test with arm_hard_ok effective target would be unsupported (lack of fpu on default cpu) whatever the validation cflags. The increased build time (many multilibs built for nothing) will also reduce the validation bandwidth (I hope the increased scratch disk space will not be a problem with my IT…) OTOH, I have a feeling that arm-linux-gnueabi* toolchain vendors probably prefer to tune them for their preferred default CPU. For instance I have an arm board running Ubuntu with gcc-5.4 configured --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb. If this is right, it would mean I should keep the configurations I currently use for arm-linux* (no multilib, rely on default cpu/fpu). ** Regarding the flags used for testing, I’m also wondering what’s the most appropriate: -mcpu or -march. Both have probably pros and cons? In https://gcc.gnu.org/pipermail/gcc/2019-September/230258.html, I described a problem where it seems that one expects the tests to run with -march=XXX. Another log of mine has an effective-target helper compiled with: -mthumb -mcpu=cortex-m33 -mfloat-abi=hard -mfloat-abi=softfp -mfpu=auto -march=armv8.1-m.main+mve.fp -mthumb which produces this error: cc1: warning: switch '-mcpu=cortex-m33' conflicts with '-march=armv8.1-m.main' switch which looks suspicious: running the tests in multiple ways surely helps uncovering bugs…. In summary, I’d like to gather opinions on: * appropriate usage of dg-require-effective-target arm_hard_ok * how to improve float-abi support detection in combination with architecture level * hopefully consensus on choosing how to configure the toolchain and run the tests. I’m suggesting default config + multilibs + runtest-flags for arm-eabi and a selection of default cpu/fpu + less runtest-flags for arm-linux*. Thanks for reading that far :-) Christophe