On Thu, 2 Dec 2021 at 23:11, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > > Hi Richard, > > I have attached a WIP untested patch for PR96463. > > IIUC, the PR suggests to transform > > lhs = svld1rq ({-1, -1, ...}, &v[0]) > > into: > > lhs = vec_perm_expr<v, v, {0, 0, ...}> > > if v is vector of 4 elements, and each element is 32 bits on little > > endian target ? > > > > I am sorry if this sounds like a silly question, but I am not sure how > > to convert a vector of type int32x4_t into svint32_t ? In the patch, I > > simply used NOP_EXPR (which I expected to fail), and gave type error > > during gimple verification: > > It should be possible in principle to have a VEC_PERM_EXPR in which > the operands are Advanced SIMD vectors and the result is an SVE vector. > > E.g., the dup in the PR would be something like this: > > foo (int32x4_t a) > { > svint32_t _2; > > _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>; > return _2; > } > > where the final operand can be built using: > > int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant (); > vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1); > for (int i = 0; i < source_nelts; ++i) > sel.quick_push (i); > > I'm not sure how well-tested that combination is though. It might need > changes to target-independent code. Hi Richard, Thanks for the suggestions. I tried the above approach in attached patch, but it still results in ICE due to type mismatch:
pr96463.c: In function ‘foo’: pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’ 8 | } | ^ svint32_t int32x4_t int32x4_t svint32_t _3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>; during GIMPLE pass: ccp dump file: pr96463.c.032t.ccp1 pr96463.c:8:1: internal compiler error: verify_gimple failed Should we perhaps add another tree code, that "extends" a fixed-width vector into it's VLA equivalent ? Thanks, Prathamesh > > Thanks, > Richard
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index 02e42a71e5e..b38c4641535 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -44,6 +44,8 @@ #include "aarch64-sve-builtins-shapes.h" #include "aarch64-sve-builtins-base.h" #include "aarch64-sve-builtins-functions.h" +#include "print-tree.h" +#include "gimple-pretty-print.h" using namespace aarch64_sve; @@ -1207,6 +1209,57 @@ public: insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); return e.use_contiguous_load_insn (icode); } + + gimple * + fold (gimple_folder &f) const OVERRIDE + { + tree arg0 = gimple_call_arg (f.call, 0); + tree arg1 = gimple_call_arg (f.call, 1); + + /* Transform: + lhs = svld1rq ({-1, -1, ... }, &v[0]) + into: + tmp = vec_perm_expr<v, v, {0, 0, ...}>. + lhs = nop_expr tmp + on little endian target. */ + + if (!BYTES_BIG_ENDIAN + && integer_all_onesp (arg0) + && TREE_CODE (arg1) == ADDR_EXPR) + { + tree t = TREE_OPERAND (arg1, 0); + if (TREE_CODE (t) == ARRAY_REF) + { + tree index = TREE_OPERAND (t, 1); + t = TREE_OPERAND (t, 0); + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) + { + t = TREE_OPERAND (t, 0); + tree vectype = TREE_TYPE (t); + if (VECTOR_TYPE_P (vectype) + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) + && wi::to_wide (TYPE_SIZE (vectype)) == 128) + { + tree lhs = gimple_call_lhs (f.call); + tree lhs_type = TREE_TYPE (lhs); + int source_nelts = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); + vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelts, 1); + for (int i = 0; i < source_nelts; i++) + sel.quick_push (i); + + vec_perm_indices indices (sel, 1, source_nelts); + if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices)) + return NULL; + + tree mask = vec_perm_indices_to_tree (lhs_type, indices); + return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask); + } + } + } + } + + return NULL; + } }; class svld1ro_impl : public load_replicate