On Fri, 21 Mar 2025 at 16:51, Richard Earnshaw (lists)
<[email protected]> wrote:
>
> On 21/03/2025 15:15, Christophe Lyon wrote:
> > On Fri, 21 Mar 2025 at 15:25, Richard Earnshaw (lists)
> > <[email protected]> wrote:
> >>
> >> On 21/03/2025 14:05, Christophe Lyon wrote:
> >>> On Fri, 21 Mar 2025 at 11:18, Richard Earnshaw (lists)
> >>> <[email protected]> wrote:
> >>>>
> >>>> On 20/03/2025 16:15, Christophe Lyon wrote:
> >>>>> Depending on if/how the testing flags are overridden, the first value
> >>>>> we try("") might not do what we want.
> >>>>>
> >>>>> For instance, if the whole testsuite is executed with
> >>>>> (A) -mthumb -march=armv7-m -mtune=cortex-m3 -mfloat-abi=softfp
> >>>>>
> >>>>> bf16_neon_ok is first compiled with
> >>>>> (A) (B)
> >>>>> where B = -mcpu=unset -march=armv8.2-a+bf16
> >>>>>
> >>>>> which is accepted, so a testcase like vld2q_lane_bf16_indices_1.c
> >>>>> is compiled with:
> >>>>> (A) (C) (B)
> >>>>> where C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
> >>>>> -mfpu=neon-fp16 -mfp16-format=ieee
> >>>>>
> >>>>> because advsimd-intrinsics.exp has set additional_flags to (C)
> >>>>> via arm_neon_fp16_ok
> >>>>>
> >>>>> So the testcase is compiled with
> >>>>> [...] -mfpu=neon-fp16 -mcpu=unset -march=armv8.2-a+bf16
> >>>>> (thus -mfpu=neon-fp16) and bf16 support is disabled.
> >>>>>
> >>>>> The patch replaces "" with -mfpu=auto which matches the intended
> >>>>> effect of -march=armv8.2-a+bf16 as added by bf16_neon_ok, and the
> >>>>> testcase is now compiled with
> >>>>> (A) (C) -mfpu=auto (B)
> >>>>>
> >>>>> However, since this effective-target is also used on aarch64 (which
> >>>>> does not support -mfpu=auto), we do this only on arm.
> >>>>>
> >>>>> This patch improves coverage, and makes
> >>>>> v{ld,st}[234]q_lane_bf16_indices_1.c pass when testsuite flags are
> >>>>> overridden as described above (e.g. for M-profile).
> >>>>>
> >>>>> gcc/testsuite/
> >>>>> * lib/target-supports.exp
> >>>>> (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache):
> >>>>> Conditionally use -mfpu=auto.
> >>>>> ---
> >>>>> gcc/testsuite/lib/target-supports.exp | 9 ++++++++-
> >>>>> 1 file changed, 8 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/gcc/testsuite/lib/target-supports.exp
> >>>>> b/gcc/testsuite/lib/target-supports.exp
> >>>>> index e2622a445c5..09b16a14024 100644
> >>>>> --- a/gcc/testsuite/lib/target-supports.exp
> >>>>> +++ b/gcc/testsuite/lib/target-supports.exp
> >>>>> @@ -6871,12 +6871,19 @@ proc add_options_for_arm_fp16fml_neon { flags }
> >>>>> {
> >>>>> proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
> >>>>> global et_arm_v8_2a_bf16_neon_flags
> >>>>> set et_arm_v8_2a_bf16_neon_flags ""
> >>>>> + set fpu_auto ""
> >>>>>
> >>>>> if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
> >>>>> return 0;
> >>>>> }
> >>>>>
> >>>>> - foreach flags {"" "-mfloat-abi=softfp -mfpu=neon-fp-armv8"
> >>>>> "-mfloat-abi=hard -mfpu=neon-fp-armv8" } {
> >>>>> + if { [istarget arm*-*-*] } {
> >>>>> + set fpu_auto "-mfpu=auto"
> >>>>> + }
> >>>>> +
> >>>>> + foreach flags [list "$fpu_auto" \
> >>>>
> >>>> Shouldn't we try first with "", even on Arm? Thus
> >>>> foreach flags [list "" "$fpu_auto" \
> >>>> ...
> >>>>
> >>> I don't think so, that's why I tried to explain above.
> >>> "" is acceptable / accepted in arm_v8_2a_bf16_neon_ok
> >>> (this is (A) (B) above, where the important parts are:
> >>> -march=armv7-m -mcpu=unset -march=armv8.2-a+bf16
> >>> (so -mfpu is set to the toolchain's default)
> >>
> >> That's never going to work reliably. We need to check, somewhere, the
> >> full set of options we intend to pass to the compilation. We can't assume
> >> that separately testing if A is ok and B is ok => A + B is ok.
> >>
> >
> > Hmmm I think I raised that problem years ago, because of the way the
> > test system is designed...
> >
> >>>
> >>> but then the actual testcase is compiled with additional flags (C)
> >>> defined by the test driver using arm_neon_fp16_ok
> >>> C = -mfpu=neon -mfloat-abi=softfp -mcpu=unset -march=armv7-a
> >>> -mfpu=neon-fp16 -mfp16-format=ieee
> >>>
> >>> so the relevant parts of (A) (C) (B) are:
> >>> -march=armv7-m -mfpu=neon -mcpu=unset -march=armv7-a -mfpu=neon-fp16
> >>> -mcpu=unset -march=armv8.2-a+bf16
> >>> which can be simplified into
> >>> -mfpu=neon-fp16 -march=armv8.2-a+bf16
> >>>
> >>> I think we need to start with -mfpu=auto instead of "", so that when
> >>> -march=armv8.2-a+bf16 takes effect, we have cancelled any other -mfpu.
> >>>
> >>
> >> Ideally a test shouldn't add any options in some test runs; that way we
> >> add some 'base configuration' coverage. We obviously can't do that
> >> everywhere and there may still be cases where the system can't test
> >> anything useful at all (hopefully only when linking, or more is needed).
> >>
> >
> > Not sure to follow? If a test wants to check that a given feature
> > works as expected, it has to use the appropriate options, doesn't it?
>
> Not if the base configuration for the run (from the compiler config or from
> the RUNTEST flags) already supports that feature.
>
OK, but isn't this error-prone?
The effective-target unconditionally adds -mcpu=unset -march=XXX
already, so relying on the implicit default value of -mcpu seems
inconsistent?
The would also lead to confusion when comparing two logs with the
exact same options, but different result because the toolchain was
built with a different --with-fpu= option (which IIRC does not appear
in gcc.log).
Not sure what accepting "" rather than -mfpu=auto gives us? Given we
are checking which flags (-mfloat-abi for instance) are need to
support -march=XXX, don't we want to make sure the corresponding fpu
is in use?
> >
> >>>
> >>>
> >>>>
> >>>>> + "-mfloat-abi=softfp -mfpu=neon-fp-armv8" \
> >>>>> + "-mfloat-abi=hard -mfpu=neon-fp-armv8" ] {
> >>>>> if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok
> >>>>> object {
> >>>>> #include <arm_neon.h>
> >>>>> #if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
> >>>>
> >>>> In fact, since aarch64 doesn't support -mfpu at all, only "" is ever
> >>>> going to work here, so perhaps we can recast all this code, perhaps
> >>>> along the lines of that in check_effective_target_arm_neon_h_ok_nocache
> >>>> so that we don't need to actually try to include arm_neon.h at all while
> >>>> figuring out the options.
> >>>>
> >>>
> >>> Maybe.
> >>>
> >>> OTOH, actually including arm_neon.h is guaranteed to give the right
> >>> answer to the question "which flags do we need to be able to include
> >>> arm_neon.h?", rather than using several conditions that are supposed
> >>> to match what arm_neon.h does?
> >>>
> >>
> >> Working out whether arm_neon.h can be included can be done with
> >> arm_neon_h_ok. That's currently Arm only, but it's trivial to extend it
> >> to aarch64. Once we have that, we can write some rules that build on that
> >> base to get other features that we might desire. But we must, at some
> >> point check, rather than assume, that combining options from different
> >> rules will result in something sane.
> >>
> >
> > I suspect most tests which currently use arm*neon* effective targets
> > actually include arm_neon.h, they should arm_neon_h_ok.
> >
> > But I'm not sure what we can do to make sure, rather than assume that
> > A + C + B does what we want, given that:
> > - A is defined by the user who runs the testsuite (via RUNTESTFLAGS /
> > target_board)
> > - C is defined by the test harness (advsimd-intrinsics.exp)
> > - B is defined as needed by individual tests
> >
> > At least we have no control on A.
>
> We have no control of A, but any require_... tests will add A to the option
> list they test; so we can override them if needed and check the override
> works.
Yes, that's already what currently happens.
> C is more complex - we perhaps need some bespoke checks in
> advsimd-intrinsics.exp that can be used to ensure that it's base flags are
> added to any tests we run. I need to think about that one a bit; perhaps we
> can arrange to add these flags to A before the individual tests run their
> additional checks.
>
what is the advantage compared to replacing "" with -mfpu=auto, which
seems simpler?
> >
> > We can create a new directory, where the .exp test harness would not
> > use additional options, and move tests which need
> > effective-target-arm-arch* there.
> > (that is 189 files out of 541 under advsimd-intrinsics/)
> >
>
> Those tests might benefit from a refactor anyway - the common arm/aarch64
> tests should probably live in dg.target/aarch, and only the aarch64-specific
> tests should be in dg.target/aarch64.
>
Indeed, but isn't this an orthogonal problem?
Thanks,
Christophe
> R.
>
> > Thanks,
> >
> > Christophe
> >
> >
> >> R.
> >>
> >>> Thanks,
> >>>
> >>> Christophe
> >>>
> >>>> R.
> >>
>