Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Richard Sandiford Wed, 20 Nov 2024 04:44:24 -0800

Jennifer Schmitz <jschm...@nvidia.com> writes:
>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandif...@arm.com> 
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz <jschm...@nvidia.com> writes:
>>> As follow-up to
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>> does this fold for signed types and feed it as callback to a helper function
>>> that adds the necessary type conversions.
>> 
>> I only meant that we should do this for multiplication (since the sign
>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>> be right for unsigned division, since unsigned division by the maximum
>> value is instead equivalent to x == MAX ? MAX : 0.
>> 
>> Some comments on the multiplication bit below:
>> 
>>> 
>>> For example, for the test case
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  return svmul_n_u64_x (pg, x, -1);
>>> }
>>> 
>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  svuint64_t D.12921;
>>>  svint64_t D.12920;
>>>  svuint64_t D.12919;
>>> 
>>>  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>  D.12921 = svneg_s64_x (pg, D.12920);
>>>  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>  goto <D.12922>;
>>>  <D.12922>:
>>>  return D.12919;
>>> }
>>> 
>>> In general, the new helper gimple_folder::convert_and_fold
>>> - takes a target type and a function pointer,
>>> - converts all non-boolean vector types to the target type,
>>> - replaces the converted arguments in the function call,
>>> - calls the callback function,
>>> - adds the necessary view converts to the gimple sequence,
>>> - and returns the new call.
>>> 
>>> Because all arguments are converted to the same target types, the helper
>>> function is only suitable for folding calls whose arguments are all of
>>> the same type. If necessary, this could be extended to convert the
>>> arguments to different types differentially.
>>> 
>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>>> 
>>> gcc/ChangeLog:
>>> 
>>>      * config/aarch64/aarch64-sve-builtins-base.cc
>>>      (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>      function and pass to gimple_folder::convert_and_fold to enable
>>>      the transform for unsigned types.
>>>      (svdiv_impl::fold): Likewise.
>>>      * config/aarch64/aarch64-sve-builtins.cc
>>>      (gimple_folder::convert_and_fold): New function that converts
>>>      operands to target type before calling callback function, adding the
>>>      necessary conversion statements.
>>>      * config/aarch64/aarch64-sve-builtins.h
>>>      (gimple_folder::convert_and_fold): Declare function.
>>>      (signed_type_suffix_index): Return type_suffix_index of signed
>>>      vector type for given width.
>>>      (function_instance::signed_type): Return signed vector type for
>>>      given width.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>      * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>      outcome.
>>>      * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>      expected outcome.
>>> ---
>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
>>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> index 1c9f515a52c..6df14a8f4c4 100644
>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> [...]
>>> @@ -2082,33 +2091,49 @@ public:
>>>       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>> 
>>>     /* If one of the operands is all integer -1, fold to svneg.  */
>>> -    tree pg = gimple_call_arg (f.call, 0);
>>> -    tree negated_op = NULL;
>>> -    if (integer_minus_onep (op2))
>>> -      negated_op = op1;
>>> -    else if (integer_minus_onep (op1))
>>> -      negated_op = op2;
>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>> 
>> Formatting nit, sorry, but: indentation looks off.
>> 
>>>       {
>>> -     function_instance instance ("svneg", functions::svneg,
>>> -                                 shapes::unary, MODE_none,
>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>> -     gcall *call = f.redirect_call (instance);
>>> -     unsigned offset_index = 0;
>>> -     if (f.pred == PRED_m)
>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>        {
>>> -         offset_index = 1;
>>> -         gimple_call_set_arg (call, 0, op1);
>>> -       }
>>> -     else
>>> -       gimple_set_num_ops (call, 5);
>>> -     gimple_call_set_arg (call, offset_index, pg);
>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>> -     return call;
>> 
>> Rather than having convert_and_fold modify the call in-place (which
>> would create an incorrectly typed call if we leave the function itself
>> unchanged), could we make it pass back a "tree" that contains the
>> (possibly converted) lhs and a "vec<tree> &" that contains the
>> (possibly converted) arguments?
> Dear Richard,
> I agree that it's preferable to not have convert_and_fold modify the call 
> in-place and wanted to ask for clarification about which sort of design you 
> had in mind:
> Option 1:
> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such 
> that the function takes a target type TYPE to which all non-boolean vector 
> operands are converted and a reference for a vec<tree> that it fills with 
> converted argument trees. It returns a tree with the (possibly converted) 
> lhs. It also adds the convert_view statements, but makes no changes to the 
> call itself.
> The caller of the function uses the converted arguments and lhs to assemble 
> the new gcall.
>
> Option 2:
> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, 
> vec<tree>, gimple_folder &), where the function converts the lhs and 
> arguments to TYPE and assigns them to the newly created tree lhs_conv and 
> vec<tree> args_conv that are passed to the function pointer FP. The callback 
> assembles the new call and returns it to convert_and_fold, which adds the 
> necessary conversion statements before returning the new call to the caller. 
> So, in this case it would be the callback modifying the call.


Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
I suppose the callback should return a gimple * rather than a gcall *
though, in case the callback wants to create a gassign instead.

(Thanks for checking btw)

Richard

Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Reply via email to