Re: [SVE] PR96463 - Optimise svld1rq from vectors

Prathamesh Kulkarni via Gcc-patches Tue, 07 Dec 2021 03:45:48 -0800

On Thu, 2 Dec 2021 at 23:11, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> > Hi Richard,
> > I have attached a WIP untested patch for PR96463.
> > IIUC, the PR suggests to transform
> > lhs = svld1rq ({-1, -1, ...}, &v[0])
> > into:
> > lhs = vec_perm_expr<v, v, {0, 0, ...}>
> > if v is vector of 4 elements, and each element is 32 bits on little
> > endian target ?
> >
> > I am sorry if this sounds like a silly question, but I am not sure how
> > to convert a vector of type int32x4_t into svint32_t ? In the patch, I
> > simply used NOP_EXPR (which I expected to fail), and gave type error
> > during gimple verification:
>
> It should be possible in principle to have a VEC_PERM_EXPR in which
> the operands are Advanced SIMD vectors and the result is an SVE vector.
>
> E.g., the dup in the PR would be something like this:
>
> foo (int32x4_t a)
> {
>   svint32_t _2;
>
>   _2 = VEC_PERM_EXPR <x_1(D), x_1(D), { 0, 1, 2, 3, 0, 1, 2, 3, ... }>;
>   return _2;
> }
>
> where the final operand can be built using:
>
>   int source_nelts = TYPE_VECTOR_SUBPARTS (…rhs type…).to_constant ();
>   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (…lhs type…), source_nelts, 1);
>   for (int i = 0; i < source_nelts; ++i)
>     sel.quick_push (i);
>
> I'm not sure how well-tested that combination is though.  It might need
> changes to target-independent code.
Hi Richard,
Thanks for the suggestions.
I tried the above approach in attached patch, but it still results in
ICE due to type mismatch:


pr96463.c: In function ‘foo’:
pr96463.c:8:1: error: type mismatch in ‘vec_perm_expr’
    8 | }
      | ^
svint32_t
int32x4_t
int32x4_t
svint32_t
_3 = VEC_PERM_EXPR <x_4(D), x_4(D), { 0, 1, 2, 3, ... }>;
during GIMPLE pass: ccp
dump file: pr96463.c.032t.ccp1
pr96463.c:8:1: internal compiler error: verify_gimple failed

Should we perhaps add another tree code, that "extends" a fixed-width
vector into it's VLA equivalent ?

Thanks,
Prathamesh
>
> Thanks,
> Richard

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 02e42a71e5e..b38c4641535 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -44,6 +44,8 @@
 #include "aarch64-sve-builtins-shapes.h"
 #include "aarch64-sve-builtins-base.h"
 #include "aarch64-sve-builtins-functions.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"
 
 using namespace aarch64_sve;
 
@@ -1207,6 +1209,57 @@ public:
     insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0));
     return e.use_contiguous_load_insn (icode);
   }
+
+  gimple *
+  fold (gimple_folder &f) const OVERRIDE
+  {
+    tree arg0 = gimple_call_arg (f.call, 0);
+    tree arg1 = gimple_call_arg (f.call, 1);
+
+    /* Transform:
+       lhs = svld1rq ({-1, -1, ... }, &v[0])
+       into:
+       tmp = vec_perm_expr<v, v, {0, 0, ...}>.
+       lhs = nop_expr tmp
+       on little endian target.  */
+
+    if (!BYTES_BIG_ENDIAN
+       && integer_all_onesp (arg0)
+       && TREE_CODE (arg1) == ADDR_EXPR)
+      {
+       tree t = TREE_OPERAND (arg1, 0);
+       if (TREE_CODE (t) == ARRAY_REF)
+         {
+           tree index = TREE_OPERAND (t, 1);
+           t = TREE_OPERAND (t, 0);
+           if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR)
+             {
+               t = TREE_OPERAND (t, 0);
+               tree vectype = TREE_TYPE (t);
+               if (VECTOR_TYPE_P (vectype)
+                   && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
+                   && wi::to_wide (TYPE_SIZE (vectype)) == 128)
+                 {
+                   tree lhs = gimple_call_lhs (f.call);
+                   tree lhs_type = TREE_TYPE (lhs);
+                   int source_nelts = TYPE_VECTOR_SUBPARTS 
(vectype).to_constant ();
+                   vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), 
source_nelts, 1);
+                   for (int i = 0; i < source_nelts; i++)
+                     sel.quick_push (i);
+
+                   vec_perm_indices indices (sel, 1, source_nelts);
+                   if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices))
+                     return NULL;
+
+                   tree mask = vec_perm_indices_to_tree (lhs_type, indices);
+                   return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask);
+                 }
+             }
+         }
+      }
+
+    return NULL;
+  }
 };
 
 class svld1ro_impl : public load_replicate

Re: [SVE] PR96463 - Optimise svld1rq from vectors

Reply via email to