Re: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv

Jennifer Schmitz Wed, 21 Aug 2024 00:23:56 -0700


> On 19 Aug 2024, at 17:03, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kyrylo Tkachov <ktkac...@nvidia.com <mailto:ktkac...@nvidia.com>> writes:
>> Hi Richard,
>> 
>>> On 19 Aug 2024, at 14:57, Richard Sandiford <richard.sandif...@arm.com> 
>>> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>>> This patch implements constant folding for svdiv. A new gimple_folder
>>>> method was added that uses const_binop to fold binary operations using a
>>>> given tree_code. For svdiv, this method is used to fold constant
>>>> operands.
>>>> Additionally, if at least one of the operands is a zero vector, svdiv is
>>>> folded to a zero vector (in case of ptrue, _x, or _z).
>>>> Tests were added to check the produced assembly for different
>>>> predicates and signed and unsigned integers.
>>>> Currently, constant folding is only implemented for integers and binary
>>>> operations, but extending it to float types and other operations is
>>>> planned for a future follow-up.
>>>> 
>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>>> regression.
>>>> OK for mainline?
>>>> 
>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>> 
>>>> gcc/
>>>> 
>>>>     * config/aarch64/aarch64-sve-builtins-base.cc
>>>>     (svdiv_impl::fold): Add constant folding.
>>>>     * config/aarch64/aarch64-sve-builtins.cc
>>>>     (gimple_folder::const_fold): New method.
>>>>     * config/aarch64/aarch64-sve-builtins.h
>>>>     (gimple_folder::const_fold): Add function declaration.
>>>> 
>>>> gcc/testsuite/
>>>> 
>>>>     * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>>>>     * gcc.target/aarch64/sve/const_fold_div_zero.c: Likewise.
>>>> 
>>>> From 79355d876503558f661b46ebbeaa11c74ce176cb Mon Sep 17 00:00:00 2001
>>>> From: Jennifer Schmitz <jschm...@nvidia.com>
>>>> Date: Thu, 15 Aug 2024 05:42:06 -0700
>>>> Subject: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv
>>>> 
>>>> This patch implements constant folding for svdiv. A new gimple_folder
>>>> method was added that uses const_binop to fold binary operations using a
>>>> given tree_code. For svdiv, this method is used to fold constant
>>>> operands.
>>>> Additionally, if at least one of the operands is a zero vector, svdiv is
>>>> folded to a zero vector (in case of ptrue, _x, or _z).
>>>> Tests were added to check the produced assembly for different
>>>> predicates and signed and unsigned integers.
>>>> Currently, constant folding is only implemented for integers and binary
>>>> operations, but extending it to float types and other operations is
>>>> planned for a future follow-up.
>>>> 
>>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>>> regression.
>>>> OK for mainline?
>>>> 
>>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>>> 
>>>> gcc/
>>>> 
>>>>     * config/aarch64/aarch64-sve-builtins-base.cc
>>>>     (svdiv_impl::fold): Add constant folding.
>>>>     * config/aarch64/aarch64-sve-builtins.cc
>>>>     (gimple_folder::const_fold): New method.
>>>>     * config/aarch64/aarch64-sve-builtins.h
>>>>     (gimple_folder::const_fold): Add function declaration.
>>>> 
>>>> gcc/testsuite/
>>>> 
>>>>     * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
>>>>     * gcc.target/aarch64/sve/const_fold_div_zero.c: Likewise.
>>>> ---
>>>> .../aarch64/aarch64-sve-builtins-base.cc      |  30 ++-
>>>> gcc/config/aarch64/aarch64-sve-builtins.cc    |  25 +++
>>>> gcc/config/aarch64/aarch64-sve-builtins.h     |   1 +
>>>> .../gcc.target/aarch64/sve/const_fold_div_1.c | 128 ++++++++++++
>>>> .../aarch64/sve/const_fold_div_zero.c         | 186 ++++++++++++++++++
>>>> .../aarch64/sve/const_fold_mul_zero.c         |  95 +++++++++
>>>> 6 files changed, 462 insertions(+), 3 deletions(-)
>>>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>>>> create mode 100644 
>>>> gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_zero.c
>>>> create mode 100644 
>>>> gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_zero.c
>>>> 
>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> index d55bee0b72f..7f948ecc0c7 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> @@ -755,8 +755,32 @@ public:
>>>>  gimple *
>>>>  fold (gimple_folder &f) const override
>>>>  {
>>>> -    tree divisor = gimple_call_arg (f.call, 2);
>>>> -    tree divisor_cst = uniform_integer_cst_p (divisor);
>>>> +    tree pg = gimple_call_arg (f.call, 0);
>>>> +    tree op1 = gimple_call_arg (f.call, 1);
>>>> +    tree op2 = gimple_call_arg (f.call, 2);
>>>> +
>>>> +    /* For integer division, if the dividend or divisor are all zeros,
>>>> +       fold to zero vector.  */
>>>> +    int step = f.type_suffix (0).element_bytes;
>>>> +    if (f.pred != PRED_m || is_ptrue (pg, step))
>>>> +      {
>>>> +     if (vector_cst_all_same (op1, step)
>>>> +         && integer_zerop (VECTOR_CST_ENCODED_ELT (op1, 0)))
>>>> +       return gimple_build_assign (f.lhs, op1);
>>>> +     if (vector_cst_all_same (op2, step)
>>>> +         && integer_zerop (VECTOR_CST_ENCODED_ELT (op2, 0)))
>>>> +       return gimple_build_assign (f.lhs, op2);
>>>> +      }
>>> 
>>> Rather than handle all-zeros as a special case here, I think we should
>>> try to do it elementwise in the const_binop.  More below.
>> 
>> Does that mean we’d want const_binop (well, the proposed vector_const_binop) 
>> to handle the cases where one of the arguments is a non-const? This case is 
>> here to handle svdiv (x, 0) and svdiv (0, x) which the compiler should be 
>> able to fold to 0, in the svdiv (x, 0) case differently from the generic 
>> GIMPLE div semantics.
> 
> Can we handle the non-const case as a separate patch?  I think it would
> make sense for that case to be integrated more with the current code,
> using the result of the existing uniform_integer_cst_p call for the
> divisor, and to handle /1 as well.
Dear Richard,
thank you for the feedback. I would like to summarize what I understand from 
your suggestions before I start revising to make sure we are on the same page:


1. The new setup for constant folding of SVE intrinsics for binary operations 
where both operands are constant vectors looks like this:

In gcc/fold-const.cc:
NEW: vector_const_binop: Handles vector part of const_binop element-wise
const_binop: For vector arguments, calls vector_const_binop with const_binop as 
callback
poly_int_binop: Is now public and -if necessary- we can implement missing codes 
(e.g. TRUNC_DIV_EXPR)

In aarch64 backend:
NEW: aarch64_vector_const_binop: adapted from int_const_binop, but calls 
poly_int_binop
intrinsic_impl::fold: calls vector_const_binop with aarch64_vector_const_binop 
as callback

2. Folding where only one operand is constant (0/x, x/0, 0*x etc.) can be 
handled individually in intrinsic_impl, but in separate patches. If there is 
already code to check for uniform vectors (e.g. in the svdiv->svasrd case), we 
try to share code.

Does that cover what you proposed? Otherwise, please feel free to correct any 
misunderstandings.
Best, Jennifer
> 
>> Or are you proposing that...
>> 
>> 
>>> 
>>>> +
>>>> +    /* Try to fold constant operands.  */
>>>> +    tree_code m_code = f.type_suffix (0).integer_p ? TRUNC_DIV_EXPR
>>>> +                     : RDIV_EXPR;
>>>> +    if (gimple *new_stmt = f.const_fold (m_code))
>>>> +      return new_stmt;
>>>> +
>>>> +    /* If the divisor is a uniform power of 2, fold to a shift
>>>> +       instruction.  */
>>>> +    tree divisor_cst = uniform_integer_cst_p (op2);
>>>> 
>>>>    if (!divisor_cst || !integer_pow2p (divisor_cst))
>>>>      return NULL;
>>>> @@ -770,7 +794,7 @@ public:
>>>>                                 shapes::binary_uint_opt_n, MODE_n,
>>>>                                 f.type_suffix_ids, GROUP_none, f.pred);
>>>>     call = f.redirect_call (instance);
>>>> -     tree d = INTEGRAL_TYPE_P (TREE_TYPE (divisor)) ? divisor : 
>>>> divisor_cst;
>>>> +     tree d = INTEGRAL_TYPE_P (TREE_TYPE (op2)) ? op2 : divisor_cst;
>>>>     new_divisor = wide_int_to_tree (TREE_TYPE (d), tree_log2 (d));
>>>>      }
>>>>    else
>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
>>>> b/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> index 0a560eaedca..0f69c586464 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>>>> @@ -3691,6 +3691,31 @@ gimple_folder::fold_to_vl_pred (unsigned int vl)
>>>>  return gimple_build_assign (lhs, builder.build ());
>>>> }
>>>> 
>>>> +/* If the predicate is svptrue or PRED_x, try to perform constant folding
>>>> +   on the call using the given tree_code.
>>>> +   Return the new statement on success, otherwise return null.  */
>>>> +gimple *
>>>> +gimple_folder::const_fold (tree_code code)
>>>> +{
>>>> +  tree pg = gimple_call_arg (call, 0);
>>>> +  if (type_suffix (0).integer_p
>>>> +      && (is_ptrue (pg, type_suffix (0).element_bytes)
>>>> +       || pred == PRED_x))
>>>> +    {
>>>> +      if (TREE_CODE_CLASS (code) == tcc_binary)
>>>> +     {
>>>> +       gcc_assert (gimple_call_num_args (call) == 3);
>>>> +       tree op1 = gimple_call_arg (call, 1);
>>>> +       tree op2 = gimple_call_arg (call, 2);
>>>> +       if (TREE_TYPE (op1) != TREE_TYPE (op2))
>>>> +         return NULL;
>>> 
>>> I assume this is rejecting the svdiv_n_* case, is that right?
>>> I think we should instead try to handle that too, since the _n
>>> variants are specifically provided as a convenience for uniform divisors.
>>> 
>>> It looks like const_binop should just work for that case too, thanks to
>>> the shift handling.  (AFAICT, the handling is not explicitly restricted
>>> to shifts.)  But if it doesn't, I think it would be a reasonable extension.
>>> 
>>>> +       if (tree res = const_binop (code, TREE_TYPE (lhs), op1, op2))
>>>> +         return gimple_build_assign (lhs, res);
>>> 
>>> Going back to the comment above about handling /0 elementwise:
>>> how about splitting the vector part of const_binop out into a
>>> new public function with the following interface:
>>> 
>>> tree vector_const_binop (tree_code code, tree arg1, tree arg2,
>>>                        tree (*elt_const_binop) (code, tree, tree))
>>> 
>>> where "the vector part" is everything in the function after:
>>> 
>>> if (TREE_CODE (arg1) == VECTOR_CST
>>>     && TREE_CODE (arg2) == VECTOR_CST
>>> ...
>>> 
>>> Then const_binop itself can just use:
>>> 
>>> return vector_const_binop (code, arg1, arg2, const_binop);
>>> 
>>> whereas aarch64 code can pass its own wrapper that handles the extra
>>> defined cases.  +Richi in case he has any thoughts on this.
>>> 
>>> I think the starting point for the aarch64 implementation should be
>>> something like:
>>> 
>> 
>> … each intrinsic _impl class has its own special-case folding function that 
>> handles these non-const-but-foldable-argument cases and passes this down 
>> here to vector_const_binop?
> 
> For this interface, where we're using the tree code to identify the
> operation, I think it would make sense to have a single callback that
> also keys off the tree code.  const_binop uses the tree code when
> determining how to handle stepped constants, for example. so it is
> already an important part of the operation, even with the callback
> handling the actual arithmetic.
> 
> Thanks,
> Richard
> 
>> Thanks,
>> Kyrill
>> 
>> 
>>> if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
>>>   {
>>>     poly_wide_int poly_res;
>>>     tree type = TREE_TYPE (arg1);
>>>     signop sign = TYPE_SIGN (type);
>>>     wi::overflow_type overflow = wi::OVF_NONE;
>>> 
>>>     ...if chain of special cases...
>>>     else if (!poly_int_binop (poly_res, code, arg1, arg2, sign, &overflow))
>>>       return NULL_TREE;
>>>     return force_fit_type (type, poly_res, false,
>>>                            TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2));
>>>   }
>>> return NULL_TREE;
>>> 
>>> which is adapted from int_const_binop, and would need poly_int_binop
>>> to become a public function.  The key thing here is that we completely
>>> ignore overflow in the calculation, because the semantics of the intrinsics
>>> are that language-level overflow does not happen.
>>> 
>>> Thanks,
>>> Richard
>>> 
>>> 
>>>> +     }
>>>> +    }
>>>> +  return NULL;
>>>> +}
>>>> +
>>>> /* Try to fold the call.  Return the new statement on success and null
>>>>   on failure.  */
>>>> gimple *
>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
>>>> b/gcc/config/aarch64/aarch64-sve-builtins.h
>>>> index 9ab6f202c30..db30225a008 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>>>> @@ -636,6 +636,7 @@ public:
>>>>  gimple *fold_to_pfalse ();
>>>>  gimple *fold_to_ptrue ();
>>>>  gimple *fold_to_vl_pred (unsigned int);
>>>> +  gimple *const_fold (tree_code);
>>>> 
>>>>  gimple *fold ();
>>>> 
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c 
>>>> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>>>> new file mode 100644
>>>> index 00000000000..d8460a4d336
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_1.c
>>>> @@ -0,0 +1,128 @@
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +#include "arm_sve.h"
>>>> +
>>>> +/*
>>>> +** s64_x_pg:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_x (pg, svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_pg:
>>>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_z_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_z (pg, svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_pg:
>>>> +**   mov     (z[0-9]+\.d), #3
>>>> +**   mov     (z[0-9]+\.d), #5
>>>> +**   sdiv    \2, p[0-7]/m, \2, \1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_m_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_m (pg, svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_x_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_ptrue ()
>>>> +{
>>>> +  return svdiv_x (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_z_ptrue ()
>>>> +{
>>>> +  return svdiv_z (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue ()
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (5), svdup_s64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_x_pg:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_x_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_x (pg, svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_z_pg:
>>>> +**   mov     z[0-9]+\.d, p[0-7]/z, #1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_z_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_z (pg, svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_pg:
>>>> +**   mov     (z[0-9]+\.d), #3
>>>> +**   mov     (z[0-9]+\.d), #5
>>>> +**   udiv    \2, p[0-7]/m, \2, \1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_m_pg (svbool_t pg)
>>>> +{
>>>> +  return svdiv_m (pg, svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_x_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_x_ptrue ()
>>>> +{
>>>> +  return svdiv_x (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_z_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_z_ptrue ()
>>>> +{
>>>> +  return svdiv_z (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_ptrue:
>>>> +**   mov     z[0-9]+\.d, #1
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_m_ptrue ()
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_u64 (5), svdup_u64 (3));
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_zero.c 
>>>> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_zero.c
>>>> new file mode 100644
>>>> index 00000000000..00d14a46ced
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_div_zero.c
>>>> @@ -0,0 +1,186 @@
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +#include "arm_sve.h"
>>>> +
>>>> +/*
>>>> +** s64_x_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svdiv_x (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_z_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svdiv_z (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_pg_op1:
>>>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svdiv_m (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_x_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_x (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_z_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_z (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_pg_op2:
>>>> +**   mov     (z[0-9]+)\.b, #0
>>>> +**   sdiv    (z[0-9]+\.d), p[0-7]/m, \2, \1\.d
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_m (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op1 (svint64_t op2)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op2 (svint64_t op1)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op1_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op1_op2 ()
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (0), svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_x_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_x_pg_op1 (svbool_t pg, svuint64_t op2)
>>>> +{
>>>> +  return svdiv_x (pg, svdup_u64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_z_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_z_pg_op1 (svbool_t pg, svuint64_t op2)
>>>> +{
>>>> +  return svdiv_z (pg, svdup_u64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_pg_op1:
>>>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_m_pg_op1 (svbool_t pg, svuint64_t op2)
>>>> +{
>>>> +  return svdiv_m (pg, svdup_u64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_x_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svuint64_t u64_x_pg_op2 (svbool_t pg, svuint64_t op1)
>>>> +{
>>>> +  return svdiv_x (pg, op1, svdup_u64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_z_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_z_pg_op2 (svbool_t pg, svuint64_t op1)
>>>> +{
>>>> +  return svdiv_z (pg, op1, svdup_u64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_pg_op2:
>>>> +**   mov     (z[0-9]+)\.b, #0
>>>> +**   udiv    (z[0-9]+\.d), p[0-7]/m, \2, \1\.d
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_m_pg_op2 (svbool_t pg, svuint64_t op1)
>>>> +{
>>>> +  return svdiv_m (pg, op1, svdup_u64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_ptrue_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_m_ptrue_op1 (svuint64_t op2)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_u64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_ptrue_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_m_ptrue_op2 (svuint64_t op1)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), op1, svdup_u64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** u64_m_ptrue_op1_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svuint64_t u64_m_ptrue_op1_op2 ()
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_u64 (0), svdup_u64 (0));
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_zero.c 
>>>> b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_zero.c
>>>> new file mode 100644
>>>> index 00000000000..793291449c1
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/const_fold_mul_zero.c
>>>> @@ -0,0 +1,95 @@
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +/* { dg-options "-O2" } */
>>>> +
>>>> +#include "arm_sve.h"
>>>> +
>>>> +/*
>>>> +** s64_x_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svmul_x (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_pg_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_z_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svdiv_z (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_pg_op1:
>>>> +**   mov     z[0-9]+\.d, p[0-7]/z, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_pg_op1 (svbool_t pg, svint64_t op2)
>>>> +{
>>>> +  return svdiv_m (pg, svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_x_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**   ret
>>>> +*/
>>>> +svint64_t s64_x_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_x (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_z_pg_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_z_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_z (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_pg_op2:
>>>> +**   mov     (z[0-9]+)\.b, #0
>>>> +**   mul     (z[0-9]+\.d), p[0-7]+/m, \2, \1\.d
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_pg_op2 (svbool_t pg, svint64_t op1)
>>>> +{
>>>> +  return svdiv_m (pg, op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op1:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op1 (svint64_t op2)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (0), op2);
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op2 (svint64_t op1)
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), op1, svdup_s64 (0));
>>>> +}
>>>> +
>>>> +/*
>>>> +** s64_m_ptrue_op1_op2:
>>>> +**   mov     z[0-9]+\.b, #0
>>>> +**      ret
>>>> +*/
>>>> +svint64_t s64_m_ptrue_op1_op2 ()
>>>> +{
>>>> +  return svdiv_m (svptrue_b64 (), svdup_s64 (0), svdup_s64 (0));
>>>> +}

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 1/2] SVE intrinsics: Fold constant operands for svdiv

Reply via email to