> On 8 Oct 2025, at 08:49, Tamar Christina <[email protected]> wrote: > > External email: Use caution opening links or attachments > > > Hi Jennifer, > >> -----Original Message----- >> From: Jennifer Schmitz <[email protected]> >> Sent: 24 September 2025 08:31 >> To: Alex Coplan <[email protected]> >> Cc: GCC Patches <[email protected]>; Richard Earnshaw >> <[email protected]>; Kyrylo Tkachov <[email protected]>; >> Tamar Christina <[email protected]>; Andrew Pinski >> <[email protected]> >> Subject: Re: [PATCH 1/2] [PR121604] aarch64: Fix folding of svbrka/b + >> svpmov_lane with pfalse arguments >> >> >> >>> On 15 Sep 2025, at 09:30, Jennifer Schmitz <[email protected]> wrote: >>> >>> External email: Use caution opening links or attachments >>> >>> >>>> On 27 Aug 2025, at 09:14, Jennifer Schmitz <[email protected]> wrote: >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> >>>>> On 26 Aug 2025, at 16:02, Alex Coplan <[email protected]> wrote: >>>>> >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On 26/08/2025 13:39, Jennifer Schmitz wrote: >>>>>> The function gimple_folder::fold_pfalse that was introduced in >>>>>> g5289540ed58e42ae66255e31f22afe4ca0a6e15e >>>>>> inappropriately folds >>>>>> 1) svbrka/b with _m if the inactive operand is pfalse >>>>>> 2) svpmov_lane with _m if the non-governing predicate operand is >> pfalse. >>>>>> To fix this, we added extra checks in the folding logic excluding certain >>>>>> function shapes. >>>>>> >>>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no >> regression. >>>>>> OK for trunk? >>>>>> >>>>>> Signed-off-by: Jennifer Schmitz <[email protected]> >>>>>> >>>>>> gcc/ >>>>>> PR target/121604 >>>>>> * config/aarch64/aarch64-sve-builtins.cc >>>>>> (gimple_folder::fold_pfalse): Add checks for function shape. >>>>>> >>>>>> gcc/testsuite/ >>>>>> PR target/121604 >>>>>> * gcc.target/aarch64/sve/pr121604_brk.c: New test. >>>>>> * gcc.target/aarch64/sve2/pr121604_pmov.c: Likewise. >>>>>> --- >>>>>> gcc/config/aarch64/aarch64-sve-builtins.cc | 6 +++-- >>>>>> .../gcc.target/aarch64/sve/pr121604_brk.c | 25 >> +++++++++++++++++++ >>>>>> .../gcc.target/aarch64/sve2/pr121604_pmov.c | 16 ++++++++++++ >>>>>> 3 files changed, 45 insertions(+), 2 deletions(-) >>>>>> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c >>>>>> create mode 100644 >> gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c >>>>>> >>>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc >> b/gcc/config/aarch64/aarch64-sve-builtins.cc >>>>>> index 1764cf8f7e8..41b328d96fc 100644 >>>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc >>>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc >>>>>> @@ -3641,9 +3641,11 @@ gimple_folder::fold_pfalse () >>>>>> inactive vector (arg0), while other function shapes are folded >>>>>> to op1 (arg1). */ >>>>>> tree arg1 = gimple_call_arg (call, 1); >>>>>> - if (is_pfalse (arg1)) >>>>>> + if (is_pfalse (arg1) >>>>>> + && shape != aarch64_sve::shapes::pmov_to_vector_lane) >>>>>> return fold_call_to (arg0); >>>>>> - if (is_pfalse (arg0)) >>>>>> + if (is_pfalse (arg0) >>>>>> + && shape != aarch64_sve::shapes::unary) >>>>>> return fold_call_to (arg1); >>>>>> return nullptr; >>>>>> } >>>>> >>>>> This fix feels very fragile, like we could easily have the same bug >>>>> introduced >>>>> again for other intrinsics that have non-governing predicate operands; >> have you >>>>> verified there are no other such SVE intrinsics? Either way, it's >>>>> possible that more such intrinsics will get introduced in the future, >>>>> and we'll have the exact same bug for those. >>>>> >>>>> Relying on is_pfalse () to infer the position of the GP is just too >>>>> fragile IMO. >>>>> >>>>> As I mentioned in the PR, I think it would be better if we explicitly >>>>> determined: >>>>> (1) the position of the governing predicate, if any, and >>>>> (2) the position of the inactive argument (to fold to on pfalse), if any, >>>>> and used that information to do the folding. >>>>> >>>>> Either we can do that by dispatching on the shapes (i.e. positively >>>>> classify the shapes into groups for the two cases that we know we can >>>>> safely optimize), or perhaps that information can be encoded in some >>>>> other way. >>>> Hi Alex, >>>> I agree that it would be more robust if we positively checked for the >> intrinsics to which the fold can be safely applied. >>>> We could probably do this using the function shape. But I think that the >>>> list >> of shapes where the fold can be applied is much longer than the cases that >> need to be excluded. In fact, svbrka/b and svpmov_lane with _m predication >> seem to be exceptional cases (which does not say that there will not be more >> of those in future). >>>> I looked around for ways to access the position of the governing predicate. >> It seems to me that there is no direct access to the position but rather it >> is >> inferred using functions like function_shape::has_merge_argument_p or like in >> function_resolver::check_gp_argument. But I think for svpmov_lane_m it >> would still assume that the predicate is a governing predicate, wouldn’t it? >>>> What do you think is the best way forward here? > > Yeah svpmov_lane_m is an annoying one as it's a completely synthetic > intrinsic. > > As you say these two are an exception to the rule rather than the norm. So > how about > this solution. > > As you say, the code assumes that governing predicates are at arg0 for unary > functions, > and in other cases at op1. > > I think we should make this overridable, > > That is in function_base add two virtual functions > > virtual int get_gp_arg_index () const > { > return 0; > } > > virtual int get_fold_to_arg_index () const > { > return 1; > } > > In gimple_folder::fold_pfalse () we can then use these calls, > If int argi0 = base->get_gp_arg_index (); returns -1 for argi0 we can > then stop folding. > > We then set in pmov_to_vector_lane the get_gp_arg_index to return -1, > > int get_gp_arg_index () const override > { > return -1; > } > > and for svbrk_unary_impl we adjust the arguments appropriately. > > That should be easily backportable too. Hi Tamar, Thanks for the review. I adjusted the patch based on your solution, but changed the two new functions to be specific to the case of m predication. The reason is that if get_gp_arg_index returns the “normal” position of the governing predicate (i.e. for x or z predication), we still have to adjust the position in case of m predication for a few unary shapes. And that brings us back to needing case distinctions for specific shapes in gimple_folder::fold_pfalse, which Alex wanted to avoid (if I understood correctly). What do you think? I’m attaching both patches of the series to this thread. Thanks, Jennifer
>
> Thanks,
> Tamar
>
>>>> Thanks for any advice.
>>>> Jennifer
>>> Gentle ping.
>>> Thanks,
>>> Jennifer
>> Gentle ping.
>> Thanks,
>> Jennifer
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>> b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>>>> new file mode 100644
>>>>>> index 00000000000..a474a20554d
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>>>> @@ -0,0 +1,25 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2" } */
>>>>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>>>> +
>>>>>> +#include <arm_sve.h>
>>>>>> +
>>>>>> +/*
>>>>>> +** foo:
>>>>>> +** ptrue p0\.b, all
>>>>>> +** brkb p0\.b, p0/z, p0\.b
>>>>>> +** ret
>>>>>> +*/
>>>>>> +svbool_t foo () {
>>>>>> + return svbrkb_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
>>>>>> +}
>>>>>> +
>
> Since BRKB doesn't include the first active elements, this should have folded
> to pfalse.
>
> I guess if we want that one we need to handle it in svbrk_unary_impl::fold.
This is handled in PATCH 2/2 of the series.
>
> Thanks,
> Tamar
>
>>>>>> +/*
>>>>>> +** bar:
>>>>>> +** ptrue p0\.b, all
>>>>>> +** brka p0\.b, p0/z, p0\.b
>>>>>> +** ret
>>>>>> +*/
>>>>>> +svbool_t bar () {
>>>>>> + return svbrka_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
>>>>>> +}
>>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>> b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>>>> new file mode 100644
>>>>>> index 00000000000..16844ee4add
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -march=armv8.2-a+sve2p1" } */
>>>>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>>>> +
>>>>>> +#include <arm_sve.h>
>>>>>> +
>>>>>> +/*
>>>>>> +** f:
>>>>>> +** pfalse p([0-7]+)\.b
>>>>>> +** mov z0\.b, #-1
>>>>>> +** pmov z0\[1\], p\1\.d
>>>>>> +** ret
>>>>>> +*/
>>>>>> +svuint64_t f () {
>>>>>> + return svpmov_lane_u64_m (svdup_u64 (~0UL), svpfalse (), 1);
>>>>>> +}
>>>>>> --
>>>>>> 2.34.1
V2_0002-PR121604-aarch64-Fold-svbrka-b-with-ptrue-predicate-.patch
Description: V2_0002-PR121604-aarch64-Fold-svbrka-b-with-ptrue-predicate-.patch
V2_0001-PR121604-aarch64-Fix-folding-of-svbrka-b-svpmov_lane.patch
Description: V2_0001-PR121604-aarch64-Fix-folding-of-svbrka-b-svpmov_lane.patch
