Jennifer Schmitz <jschm...@nvidia.com> writes: >> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandif...@arm.com> >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Jennifer Schmitz <jschm...@nvidia.com> writes: >>> As follow-up to >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, >>> this patch implements folding of svmul and svdiv by -1 to svneg for >>> unsigned SVE vector types. The key idea is to reuse the existing code that >>> does this fold for signed types and feed it as callback to a helper function >>> that adds the necessary type conversions. >> >> I only meant that we should do this for multiplication (since the sign >> isn't relevant for N-bit x N-bit -> N-bit multiplication). It wouldn't >> be right for unsigned division, since unsigned division by the maximum >> value is instead equivalent to x == MAX ? MAX : 0. >> >> Some comments on the multiplication bit below: >> >>> >>> For example, for the test case >>> svuint64_t foo (svuint64_t x, svbool_t pg) >>> { >>> return svmul_n_u64_x (pg, x, -1); >>> } >>> >>> the following gimple sequence is emitted (-O2 -mcpu=grace): >>> svuint64_t foo (svuint64_t x, svbool_t pg) >>> { >>> svuint64_t D.12921; >>> svint64_t D.12920; >>> svuint64_t D.12919; >>> >>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x); >>> D.12921 = svneg_s64_x (pg, D.12920); >>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921); >>> goto <D.12922>; >>> <D.12922>: >>> return D.12919; >>> } >>> >>> In general, the new helper gimple_folder::convert_and_fold >>> - takes a target type and a function pointer, >>> - converts all non-boolean vector types to the target type, >>> - replaces the converted arguments in the function call, >>> - calls the callback function, >>> - adds the necessary view converts to the gimple sequence, >>> - and returns the new call. >>> >>> Because all arguments are converted to the same target types, the helper >>> function is only suitable for folding calls whose arguments are all of >>> the same type. If necessary, this could be extended to convert the >>> arguments to different types differentially. >>> >>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. >>> OK for mainline? >>> >>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64-sve-builtins-base.cc >>> (svmul_impl::fold): Wrap code for folding to svneg in lambda >>> function and pass to gimple_folder::convert_and_fold to enable >>> the transform for unsigned types. >>> (svdiv_impl::fold): Likewise. >>> * config/aarch64/aarch64-sve-builtins.cc >>> (gimple_folder::convert_and_fold): New function that converts >>> operands to target type before calling callback function, adding the >>> necessary conversion statements. >>> * config/aarch64/aarch64-sve-builtins.h >>> (gimple_folder::convert_and_fold): Declare function. >>> (signed_type_suffix_index): Return type_suffix_index of signed >>> vector type for given width. >>> (function_instance::signed_type): Return signed vector type for >>> given width. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected >>> outcome. >>> * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise. >>> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise. >>> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. >>> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. >>> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust >>> expected outcome. >>> --- >>> .../aarch64/aarch64-sve-builtins-base.cc | 99 ++++++++++++------- >>> gcc/config/aarch64/aarch64-sve-builtins.cc | 40 ++++++++ >>> gcc/config/aarch64/aarch64-sve-builtins.h | 30 ++++++ >>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c | 9 ++ >>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c | 9 ++ >>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- >>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- >>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++- >>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- >>> 9 files changed, 180 insertions(+), 50 deletions(-) >>> >>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>> index 1c9f515a52c..6df14a8f4c4 100644 >>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc >>> [...] >>> @@ -2082,33 +2091,49 @@ public: >>> return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); >>> >>> /* If one of the operands is all integer -1, fold to svneg. */ >>> - tree pg = gimple_call_arg (f.call, 0); >>> - tree negated_op = NULL; >>> - if (integer_minus_onep (op2)) >>> - negated_op = op1; >>> - else if (integer_minus_onep (op1)) >>> - negated_op = op2; >>> - if (!f.type_suffix (0).unsigned_p && negated_op) >>> + if (integer_minus_onep (op1) || integer_minus_onep (op2)) >> >> Formatting nit, sorry, but: indentation looks off. >> >>> { >>> - function_instance instance ("svneg", functions::svneg, >>> - shapes::unary, MODE_none, >>> - f.type_suffix_ids, GROUP_none, f.pred); >>> - gcall *call = f.redirect_call (instance); >>> - unsigned offset_index = 0; >>> - if (f.pred == PRED_m) >>> + auto mul_by_m1 = [](gimple_folder &f) -> gcall * >>> { >>> - offset_index = 1; >>> - gimple_call_set_arg (call, 0, op1); >>> - } >>> - else >>> - gimple_set_num_ops (call, 5); >>> - gimple_call_set_arg (call, offset_index, pg); >>> - gimple_call_set_arg (call, offset_index + 1, negated_op); >>> - return call; >> >> Rather than having convert_and_fold modify the call in-place (which >> would create an incorrectly typed call if we leave the function itself >> unchanged), could we make it pass back a "tree" that contains the >> (possibly converted) lhs and a "vec<tree> &" that contains the >> (possibly converted) arguments? > Dear Richard, > I agree that it's preferable to not have convert_and_fold modify the call > in-place and wanted to ask for clarification about which sort of design you > had in mind: > Option 1: > tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such > that the function takes a target type TYPE to which all non-boolean vector > operands are converted and a reference for a vec<tree> that it fills with > converted argument trees. It returns a tree with the (possibly converted) > lhs. It also adds the convert_view statements, but makes no changes to the > call itself. > The caller of the function uses the converted arguments and lhs to assemble > the new gcall. > > Option 2: > gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, > vec<tree>, gimple_folder &), where the function converts the lhs and > arguments to TYPE and assigns them to the newly created tree lhs_conv and > vec<tree> args_conv that are passed to the function pointer FP. The callback > assembles the new call and returns it to convert_and_fold, which adds the > necessary conversion statements before returning the new call to the caller. > So, in this case it would be the callback modifying the call.
Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>. I suppose the callback should return a gimple * rather than a gcall * though, in case the callback wants to create a gassign instead. (Thanks for checking btw) Richard