> On 15 Sep 2025, at 09:30, Jennifer Schmitz <[email protected]> wrote:
>
> External email: Use caution opening links or attachments
>
>
>> On 27 Aug 2025, at 09:14, Jennifer Schmitz <[email protected]> wrote:
>>
>> External email: Use caution opening links or attachments
>>
>>
>>> On 26 Aug 2025, at 16:02, Alex Coplan <[email protected]> wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 26/08/2025 13:39, Jennifer Schmitz wrote:
>>>> The function gimple_folder::fold_pfalse that was introduced in
>>>> g5289540ed58e42ae66255e31f22afe4ca0a6e15e
>>>> inappropriately folds
>>>> 1) svbrka/b with _m if the inactive operand is pfalse
>>>> 2) svpmov_lane with _m if the non-governing predicate operand is pfalse.
>>>> To fix this, we added extra checks in the folding logic excluding certain
>>>> function shapes.
>>>>
>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>>> OK for trunk?
>>>>
>>>> Signed-off-by: Jennifer Schmitz <[email protected]>
>>>>
>>>> gcc/
>>>> PR target/121604
>>>> * config/aarch64/aarch64-sve-builtins.cc
>>>> (gimple_folder::fold_pfalse): Add checks for function shape.
>>>>
>>>> gcc/testsuite/
>>>> PR target/121604
>>>> * gcc.target/aarch64/sve/pr121604_brk.c: New test.
>>>> * gcc.target/aarch64/sve2/pr121604_pmov.c: Likewise.
>>>> ---
>>>> gcc/config/aarch64/aarch64-sve-builtins.cc | 6 +++--
>>>> .../gcc.target/aarch64/sve/pr121604_brk.c | 25 +++++++++++++++++++
>>>> .../gcc.target/aarch64/sve2/pr121604_pmov.c | 16 ++++++++++++
>>>> 3 files changed, 45 insertions(+), 2 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>>
>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> b/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> index 1764cf8f7e8..41b328d96fc 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> @@ -3641,9 +3641,11 @@ gimple_folder::fold_pfalse ()
>>>> inactive vector (arg0), while other function shapes are folded
>>>> to op1 (arg1). */
>>>> tree arg1 = gimple_call_arg (call, 1);
>>>> - if (is_pfalse (arg1))
>>>> + if (is_pfalse (arg1)
>>>> + && shape != aarch64_sve::shapes::pmov_to_vector_lane)
>>>> return fold_call_to (arg0);
>>>> - if (is_pfalse (arg0))
>>>> + if (is_pfalse (arg0)
>>>> + && shape != aarch64_sve::shapes::unary)
>>>> return fold_call_to (arg1);
>>>> return nullptr;
>>>> }
>>>
>>> This fix feels very fragile, like we could easily have the same bug
>>> introduced
>>> again for other intrinsics that have non-governing predicate operands; have
>>> you
>>> verified there are no other such SVE intrinsics? Either way, it's
>>> possible that more such intrinsics will get introduced in the future,
>>> and we'll have the exact same bug for those.
>>>
>>> Relying on is_pfalse () to infer the position of the GP is just too
>>> fragile IMO.
>>>
>>> As I mentioned in the PR, I think it would be better if we explicitly
>>> determined:
>>> (1) the position of the governing predicate, if any, and
>>> (2) the position of the inactive argument (to fold to on pfalse), if any,
>>> and used that information to do the folding.
>>>
>>> Either we can do that by dispatching on the shapes (i.e. positively
>>> classify the shapes into groups for the two cases that we know we can
>>> safely optimize), or perhaps that information can be encoded in some
>>> other way.
>> Hi Alex,
>> I agree that it would be more robust if we positively checked for the
>> intrinsics to which the fold can be safely applied.
>> We could probably do this using the function shape. But I think that the
>> list of shapes where the fold can be applied is much longer than the cases
>> that need to be excluded. In fact, svbrka/b and svpmov_lane with _m
>> predication seem to be exceptional cases (which does not say that there will
>> not be more of those in future).
>> I looked around for ways to access the position of the governing predicate.
>> It seems to me that there is no direct access to the position but rather it
>> is inferred using functions like function_shape::has_merge_argument_p or
>> like in function_resolver::check_gp_argument. But I think for svpmov_lane_m
>> it would still assume that the predicate is a governing predicate, wouldn’t
>> it?
>> What do you think is the best way forward here?
>> Thanks for any advice.
>> Jennifer
> Gentle ping.
> Thanks,
> Jennifer
Gentle ping.
Thanks,
Jennifer
>>>
>>> Thanks,
>>> Alex
>>>
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>> b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>> new file mode 100644
>>>> index 00000000000..a474a20554d
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr121604_brk.c
>>>> @@ -0,0 +1,25 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2" } */
>>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>> +
>>>> +#include <arm_sve.h>
>>>> +
>>>> +/*
>>>> +** foo:
>>>> +** ptrue p0\.b, all
>>>> +** brkb p0\.b, p0/z, p0\.b
>>>> +** ret
>>>> +*/
>>>> +svbool_t foo () {
>>>> + return svbrkb_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
>>>> +}
>>>> +
>>>> +/*
>>>> +** bar:
>>>> +** ptrue p0\.b, all
>>>> +** brka p0\.b, p0/z, p0\.b
>>>> +** ret
>>>> +*/
>>>> +svbool_t bar () {
>>>> + return svbrka_b_m (svpfalse (), svptrue_b8 (), svptrue_b8 ());
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>> b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>> new file mode 100644
>>>> index 00000000000..16844ee4add
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121604_pmov.c
>>>> @@ -0,0 +1,16 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=armv8.2-a+sve2p1" } */
>>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>> +
>>>> +#include <arm_sve.h>
>>>> +
>>>> +/*
>>>> +** f:
>>>> +** pfalse p([0-7]+)\.b
>>>> +** mov z0\.b, #-1
>>>> +** pmov z0\[1\], p\1\.d
>>>> +** ret
>>>> +*/
>>>> +svuint64_t f () {
>>>> + return svpmov_lane_u64_m (svdup_u64 (~0UL), svpfalse (), 1);
>>>> +}
>>>> --
>>>> 2.34.1