> On 9 Oct 2024, at 17:35, Richard Sandiford <richard.sandif...@arm.com> wrote:
>
> External email: Use caution opening links or attachments
>
>
> Tamar Christina <tamar.christ...@arm.com> writes:
>> Hi Richard,
>>
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandif...@arm.com>
>>> Sent: Wednesday, October 9, 2024 12:58 PM
>>> To: gcc-patches@gcc.gnu.org
>>> Cc: ktkac...@nvidia.com; Richard Earnshaw <richard.earns...@arm.com>;
>>> Tamar Christina <tamar.christ...@arm.com>
>>> Subject: [PATCH] aarch64: Fix folding of degenerate svwhilele case
>>> [PR117045]
>>>
>>> The svwhilele folder mishandled the degenerate case in which
>>> the second argument is the maximum integer. In that case,
>>> the result is all-true regardless of the first parameter:
>>>
>>> If the second scalar operand is equal to the maximum signed integer
>>> value then a condition which includes an equality test can never fail
>>> and the result will be an all-true predicate.
>>>
>>> This is because the conceptual "increment the first operand
>>> by 1 after each element" is done modulo the range of the operand.
>>> The GCC code was instead treating it as infinite precision.
>>> whilele_5.c even had a test for the incorrect behaviour.
>>>
>>> The easiest fix seemed to be to handle that case specially before
>>> doing constant folding. This also copes with variable first operands.
>>>
>>> Tested on aarch64-linux-gnu. I'll push on Friday if there are no
>>> comments before then. Since it's a wrong-code bug, I'd also like
>>> to backport to release branches.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> gcc/
>>> PR target/116999
>>> PR target/117045
>>> * config/aarch64/aarch64-sve-builtins-base.cc
>>> (svwhilelx_impl::fold): Check for WHILELTs of the minimum value
>>> and WHILELEs of the maximum value. Fold them to all-false and
>>> all-true respectively.
>>>
>>> gcc/testsuite/
>>> PR target/116999
>>> PR target/117045
>>> * gcc.target/aarch64/sve/acle/general/whilele_5.c: Fix bogus
>>> expected result.
>>> * gcc.target/aarch64/sve/acle/general/whilele_11.c: New test.
>>> * gcc.target/aarch64/sve/acle/general/whilele_12.c: Likewise.
>>> ---
>>> .../aarch64/aarch64-sve-builtins-base.cc | 11 +++++-
>>> .../aarch64/sve/acle/general/whilele_11.c | 31 +++++++++++++++++
>>> .../aarch64/sve/acle/general/whilele_12.c | 34 +++++++++++++++++++
>>> .../aarch64/sve/acle/general/whilele_5.c | 2 +-
>>> 4 files changed, 76 insertions(+), 2 deletions(-)
>>> create mode 100644
>>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> create mode 100644
>>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_12.c
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> index 4b33585d981..3d0975e4294 100644
>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> @@ -2945,7 +2945,9 @@ public:
>>> : while_comparison (unspec_for_sint, unspec_for_uint), m_eq_p (eq_p)
>>> {}
>>>
>>> - /* Try to fold a call by treating its arguments as constants of type T.
>>> */
>>> + /* Try to fold a call by treating its arguments as constants of type T.
>>> + We have already filtered out the degenerate cases of X .LT. MIN
>>> + and X .LE. MAX. */
>>> template<typename T>
>>> gimple *
>>> fold_type (gimple_folder &f) const
>>> @@ -3001,6 +3003,13 @@ public:
>>> if (f.vectors_per_tuple () > 1)
>>> return nullptr;
>>>
>>> + /* Filter out cases where the condition is always true or always
>>> false. */
>>> + tree arg1 = gimple_call_arg (f.call, 1);
>>> + if (!m_eq_p && operand_equal_p (arg1, TYPE_MIN_VALUE (TREE_TYPE
>>> (arg1))))
>>> + return f.fold_to_pfalse ();
>>
>> Just a quick question for my own understanding, I assume the reason MIN
>> is handled here is because fold_type will decrement the value at some point?
>>
>> Otherwise wouldn't MIN + 1 still fit inside the type's precision?
>>
>> FWIW patch looks good to me, just wondering why the MIN case is needed :)
>
> I admit it probably isn't needed to fix the bug. I just though it would
> look strange if we handled the arg1 extremity for m_eq_p without also
> handling it for !m_eq_p.
The patch LGTM.
Thanks,
Kyrill
>
> Thanks,
> Richard
>
>>
>> Cheers,
>> Tamar
>>
>>> + if (m_eq_p && operand_equal_p (arg1, TYPE_MAX_VALUE (TREE_TYPE
>>> (arg1))))
>>> + return f.fold_to_ptrue ();
>>> +
>>> if (f.type_suffix (1).unsigned_p)
>>> return fold_type<poly_uint64> (f);
>>> else
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> new file mode 100644
>>> index 00000000000..2be9dc5c534
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_11.c
>>> @@ -0,0 +1,31 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +#include <arm_sve.h>
>>> +#include <limits.h>
>>> +
>>> +svbool_t
>>> +f1 (volatile int32_t *ptr)
>>> +{
>>> + return svwhilelt_b8_s32 (*ptr, INT32_MIN);
>>> +}
>>> +
>>> +svbool_t
>>> +f2 (volatile uint32_t *ptr)
>>> +{
>>> + return svwhilelt_b16_u32 (*ptr, 0);
>>> +}
>>> +
>>> +svbool_t
>>> +f3 (volatile int64_t *ptr)
>>> +{
>>> + return svwhilelt_b32_s64 (*ptr, INT64_MIN);
>>> +}
>>> +
>>> +svbool_t
>>> +f4 (volatile uint64_t *ptr)
>>> +{
>>> + return svwhilelt_b64_u64 (*ptr, 0);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times {\tpfalse\tp[0-9]+\.b\n} 4 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_12.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_12.c
>>> new file mode 100644
>>> index 00000000000..713065c3145
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_12.c
>>> @@ -0,0 +1,34 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2" } */
>>> +
>>> +#include <arm_sve.h>
>>> +#include <limits.h>
>>> +
>>> +svbool_t
>>> +f1 (volatile int32_t *ptr)
>>> +{
>>> + return svwhilele_b8_s32 (*ptr, INT32_MAX);
>>> +}
>>> +
>>> +svbool_t
>>> +f2 (volatile uint32_t *ptr)
>>> +{
>>> + return svwhilele_b16_u32 (*ptr, UINT32_MAX);
>>> +}
>>> +
>>> +svbool_t
>>> +f3 (volatile int64_t *ptr)
>>> +{
>>> + return svwhilele_b32_s64 (*ptr, INT64_MAX);
>>> +}
>>> +
>>> +svbool_t
>>> +f4 (volatile uint64_t *ptr)
>>> +{
>>> + return svwhilele_b64_u64 (*ptr, UINT64_MAX);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.b(?:, all)\n} } } */
>>> +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.h(?:, all)\n} } } */
>>> +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.s(?:, all)\n} } } */
>>> +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.d(?:, all)\n} } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_5.c
>>> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_5.c
>>> index 7c73aa5926b..79f0e64b2ae 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_5.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_5.c
>>> @@ -28,7 +28,7 @@ test3 (svbool_t *ptr)
>>> *ptr = svwhilele_b16_s32 (0x7ffffffb, 0x7fffffff);
>>> }
>>>
>>> -/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.h, vl5\n} } } */
>>> +/* { dg-final { scan-assembler {\tptrue\tp[0-9]+\.h(?:, all)\n} } } */
>>>
>>> void
>>> test4 (svbool_t *ptr)
>>> --
>>> 2.25.1