Hi Jennifer,
> -----Original Message-----
> From: Jennifer Schmitz <[email protected]>
> Sent: 24 September 2025 08:31
> To: Alex Coplan <[email protected]>
> Cc: GCC Patches <[email protected]>; Richard Earnshaw
> <[email protected]>; Kyrylo Tkachov <[email protected]>;
> Tamar Christina <[email protected]>; Andrew Pinski
> <[email protected]>
> Subject: Re: [PATCH 1/2] [PR121604] aarch64: Fix folding of svbrka/b +
> svpmov_lane with pfalse arguments
>
>
>
> > On 15 Sep 2025, at 09:30, Jennifer Schmitz <[email protected]> wrote:
> >
> > External email: Use caution opening links or attachments
> >
> >
> >> On 27 Aug 2025, at 09:14, Jennifer Schmitz <[email protected]> wrote:
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >>> On 26 Aug 2025, at 16:02, Alex Coplan <[email protected]> wrote:
> >>>
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On 26/08/2025 13:39, Jennifer Schmitz wrote:
> >>>> The function gimple_folder::fold_pfalse that was introduced in
> >>>> g5289540ed58e42ae66255e31f22afe4ca0a6e15e
> >>>> inappropriately folds
> >>>> 1) svbrka/b with _m if the inactive operand is pfalse
> >>>> 2) svpmov_lane with _m if the non-governing predicate operand is
> pfalse.
> >>>> To fix this, we added extra checks in the folding logic excluding certain
> >>>> function shapes.
> >>>>
> >>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no
> regression.
> >>>> OK for trunk?
> >>>>
> >>>> Signed-off-by: Jennifer Schmitz <[email protected]>
> >>>>
> >>>> gcc/
> >>>> PR target/121604
> >>>> * config/aarch64/aarch64-sve-builtins.cc
> >>>> (gimple_folder::fold_pfalse): Add checks for function shape.
> >>>>
> >>>> gcc/testsuite/
> >>>> PR target/121604
> >>>> * gcc.target/aarch64/sve/pr121604_brk.c: New test.
> >>>> * gcc.target/aarch64/sve2/pr121604_pmov.c: Likewise.
> >>>> ---
> >>>> gcc/config/aarch64/aarch64-sve-builtins.cc | 6 +++--
> >>>> .../gcc.target/aarch64/sve/pr121604_brk.c | 25
> +++++++++++++++++++
> >>>> .../gcc.target/aarch64/sve2/pr121604_pmov.c | 16 ++++++++++++
> >>>> 3 files changed, 45 insertions(+), 2 deletions(-)
> >>>> create mode 100644
> gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
> >>>> create mode 100644
> gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
> >>>>
> >>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> >>>> index 1764cf8f7e8..41b328d96fc 100644
> >>>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> >>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> >>>> @@ -3641,9 +3641,11 @@ gimple_folder::fold_pfalse ()
> >>>> inactive vector (arg0), while other function shapes are folded
> >>>> to op1 (arg1). */
> >>>> tree arg1 = gimple_call_arg (call, 1);
> >>>> - if (is_pfalse (arg1))
> >>>> + if (is_pfalse (arg1)
> >>>> + && shape != aarch64_sve::shapes::pmov_to_vector_lane)
> >>>> return fold_call_to (arg0);
> >>>> - if (is_pfalse (arg0))
> >>>> + if (is_pfalse (arg0)
> >>>> + && shape != aarch64_sve::shapes::unary)
> >>>> return fold_call_to (arg1);
> >>>> return nullptr;
> >>>> }
> >>>
> >>> This fix feels very fragile, like we could easily have the same bug
> >>> introduced
> >>> again for other intrinsics that have non-governing predicate operands;
> have you
> >>> verified there are no other such SVE intrinsics? Either way, it's
> >>> possible that more such intrinsics will get introduced in the future,
> >>> and we'll have the exact same bug for those.
> >>>
> >>> Relying on is_pfalse () to infer the position of the GP is just too
> >>> fragile IMO.
> >>>
> >>> As I mentioned in the PR, I think it would be better if we explicitly
> >>> determined:
> >>> (1) the position of the governing predicate, if any, and
> >>> (2) the position of the inactive argument (to fold to on pfalse), if any,
> >>> and used that information to do the folding.
> >>>
> >>> Either we can do that by dispatching on the shapes (i.e. positively
> >>> classify the shapes into groups for the two cases that we know we can
> >>> safely optimize), or perhaps that information can be encoded in some
> >>> other way.
> >> Hi Alex,
> >> I agree that it would be more robust if we positively checked for the
> intrinsics to which the fold can be safely applied.
> >> We could probably do this using the function shape. But I think that the
> >> list
> of shapes where the fold can be applied is much longer than the cases that
> need to be excluded. In fact, svbrka/b and svpmov_lane with _m predication
> seem to be exceptional cases (which does not say that there will not be more
> of those in future).
> >> I looked around for ways to access the position of the governing predicate.
> It seems to me that there is no direct access to the position but rather it is
> inferred using functions like function_shape::has_merge_argument_p or like in
> function_resolver::check_gp_argument. But I think for svpmov_lane_m it
> would still assume that the predicate is a governing predicate, wouldn’t it?
> >> What do you think is the best way forward here?
Yeah svpmov_lane_m is an annoying one as it's a completely synthetic intrinsic.
As you say these two are an exception to the rule rather than the norm. So how
about
this solution.
As you say, the code assumes that governing predicates are at arg0 for unary
functions,
and in other cases at op1.
I think we should make this overridable,
That is in function_base add two virtual functions
virtual int get_gp_arg_index () const
{
return 0;
}
virtual int get_fold_to_arg_index () const
{
return 1;
}
In gimple_folder::fold_pfalse () we can then use these calls,
If int argi0 = base->get_gp_arg_index (); returns -1 for argi0 we can
then stop folding.
We then set in pmov_to_vector_lane the get_gp_arg_index to return -1,
int get_gp_arg_index () const override
{
return -1;
}
and for svbrk_unary_impl we adjust the arguments appropriately.
That should be easily backportable too.
Thanks,
Tamar
> >> Thanks for any advice.
> >> Jennifer
> > Gentle ping.
> > Thanks,
> > Jennifer
> Gentle ping.
> Thanks,
> Jennifer
> >>>
> >>> Thanks,
> >>> Alex
> >>>
> >>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
> b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
> >>>> new file mode 100644
> >>>> index 00000000000..a474a20554d
> >>>> --- /dev/null
> >>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
> >>>> @@ -0,0 +1,25 @@
> >>>> +/* { dg-do compile } */
> >>>> +/* { dg-options "-O2" } */
> >>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
> >>>> +
> >>>> +#include <arm_sve.h>
> >>>> +
> >>>> +/*
> >>>> +** foo:
> >>>> +** ptrue p0\.b, all
> >>>> +** brkb p0\.b, p0/z, p0\.b
> >>>> +** ret
> >>>> +*/
> >>>> +svbool_t foo () {
> >>>> + return svbrkb_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
> >>>> +}
> >>>> +
Since BRKB doesn't include the first active elements, this should have folded
to pfalse.
I guess if we want that one we need to handle it in svbrk_unary_impl::fold.
Thanks,
Tamar
> >>>> +/*
> >>>> +** bar:
> >>>> +** ptrue p0\.b, all
> >>>> +** brka p0\.b, p0/z, p0\.b
> >>>> +** ret
> >>>> +*/
> >>>> +svbool_t bar () {
> >>>> + return svbrka_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
> >>>> +}
> >>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
> b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
> >>>> new file mode 100644
> >>>> index 00000000000..16844ee4add
> >>>> --- /dev/null
> >>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
> >>>> @@ -0,0 +1,16 @@
> >>>> +/* { dg-do compile } */
> >>>> +/* { dg-options "-O2 -march=armv8.2-a+sve2p1" } */
> >>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
> >>>> +
> >>>> +#include <arm_sve.h>
> >>>> +
> >>>> +/*
> >>>> +** f:
> >>>> +** pfalse p([0-7]+)\.b
> >>>> +** mov z0\.b, #-1
> >>>> +** pmov z0\[1\], p\1\.d
> >>>> +** ret
> >>>> +*/
> >>>> +svuint64_t f () {
> >>>> + return svpmov_lane_u64_m (svdup_u64 (~0UL), svpfalse (), 1);
> >>>> +}
> >>>> --
> >>>> 2.34.1
>