On Tue, 11 Apr 2023 at 14:17, Richard Biener <[email protected]> wrote:
>
> On Wed, Apr 5, 2023 at 10:39 AM Prathamesh Kulkarni via Gcc-patches
> <[email protected]> wrote:
> >
> > Hi,
> > For the following test:
> >
> > svint32_t f(svint32_t v)
> > {
> > return svrev_s32 (svrev_s32 (v));
> > }
> >
> > We generate 2 rev instructions instead of nop:
> > f:
> > rev z0.s, z0.s
> > rev z0.s, z0.s
> > ret
> >
> > The attached patch tries to fix that by trying to recognize the following
> > pattern in match.pd:
> > v1 = VEC_PERM_EXPR (v0, v0, mask)
> > v2 = VEC_PERM_EXPR (v1, v1, mask)
> > -->
> > v2 = v0
> > if mask is { nelts - 1, nelts - 2, nelts - 3, ... }
> >
> > Code-gen with patch:
> > f:
> > ret
> >
> > Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap in progress.
> > Does it look OK for stage-1 ?
>
> I didn't look at the patch but tree-ssa-forwprop.cc:simplify_permutation
> should
> handle two consecutive permutes with the is_combined_permutation_identity
> which might need tweaking for VLA vectors
Hi Richard,
Thanks for the suggestions. The attached patch modifies
is_combined_permutation_identity
to recognize the above pattern.
Does it look OK ?
Bootstrap+test in progress on aarch64-linux-gnu and x86_64-linux-gnu.
Thanks,
Prathamesh
>
> Richard.
>
> >
> > Thanks,
> > Prathamesh
gcc/ChangeLog:
* tree-ssa-forwprop.cc (is_combined_permutation_identity):
New parameter def_stmt.
Try to simplify two successive VEC_PERM_EXPRs with single operand
and same mask, where mask chooses elements in reverse order.
gcc/testesuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/rev-1.c: New test.
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
new file mode 100644
index 00000000000..e57ee67d716
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include <arm_sve.h>
+
+svint32_t f(svint32_t v)
+{
+ return svrev_s32 (svrev_s32 (v));
+}
+
+/* { dg-final { scan-tree-dump "return v_1\\(D\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 9b567440ba4..5cbee077d89 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2532,7 +2532,7 @@ simplify_bitfield_ref (gimple_stmt_iterator *gsi)
gives back one of the input. */
static int
-is_combined_permutation_identity (tree mask1, tree mask2)
+is_combined_permutation_identity (tree mask1, tree mask2, gimple *def_stmt =
NULL)
{
tree mask;
unsigned HOST_WIDE_INT nelts, i, j;
@@ -2541,7 +2541,36 @@ is_combined_permutation_identity (tree mask1, tree mask2)
gcc_checking_assert (TREE_CODE (mask1) == VECTOR_CST
&& TREE_CODE (mask2) == VECTOR_CST);
+ if (def_stmt)
+ gcc_checking_assert (is_gimple_assign (def_stmt)
+ && gimple_assign_rhs_code (def_stmt) == VEC_PERM_EXPR);
+
mask = fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask1, mask2);
+
+ /* For VLA masks, check for the following pattern:
+ v1 = VEC_PERM_EXPR (v0, v0, mask)
+ v2 = VEC_PERM_EXPR (v1, v1, mask)
+ -->
+ v2 = v0
+ if mask is {nelts - 1, nelts - 2, ...}. */
+
+ if (operand_equal_p (mask1, mask2, 0)
+ && !VECTOR_CST_NELTS (mask1).is_constant ()
+ && def_stmt
+ && operand_equal_p (gimple_assign_rhs1 (def_stmt),
+ gimple_assign_rhs2 (def_stmt), 0))
+ {
+ vec_perm_builder builder;
+ if (tree_to_vec_perm_builder (&builder, mask1))
+ {
+ poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1));
+ vec_perm_indices sel (builder, 1, nelts);
+ if (sel.series_p (0, 1, nelts - 1, -1))
+ return 1;
+ }
+ return 0;
+ }
+
if (mask == NULL_TREE || TREE_CODE (mask) != VECTOR_CST)
return 0;
@@ -2629,7 +2658,7 @@ simplify_permutation (gimple_stmt_iterator *gsi)
op3 = gimple_assign_rhs3 (def_stmt);
if (TREE_CODE (op3) != VECTOR_CST)
return 0;
- ident = is_combined_permutation_identity (op3, op2);
+ ident = is_combined_permutation_identity (op3, op2, def_stmt);
if (!ident)
return 0;
orig = (ident == 1) ? gimple_assign_rhs1 (def_stmt)