When two_operator SLP nodes are built, the VEC_PERM_EXPR that merges the result selects a lane only based on the operator found. If the input nodes have duplicate elements, there may be more than one way to choose. This commit changes the policy to reuse an existing lane with the result that we can possibly free up lanes entirely.
For example, given two vectors with duplicates: A = {a1, a1, a2, a2} B = {b1, b1, b3, b2} A two_operator node with operators +, -, +, - is currently built as: RES = VEC_PERM_EXPR<A, B>(0, 5, 2, 7) With this patch, the permutation becomes: RES = VEC_PERM_EXPR<A, B>(0, 4, 2, 6) Lanes 0 and 2 are reused and lanes 1 and 3 are not utilized anymore. The direct effect of this change can be seen in the AArch64 test case, where the simpler permutation allows to lower to a TRN1 instead of an expensive TBL. Bootstrapped and reg-tested on AArch64 (C, C++, Fortran). Manolis Tsamis was the patch's initial author before I took it over. gcc/ChangeLog: * tree-vect-slp.cc: Reduce lane utilization in VEC_PERM_EXPRs for two_operators gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-perm-13.c: New test. * gcc.target/aarch64/sve/slp-perm-13.c: New test. Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu> --- gcc/testsuite/gcc.dg/vect/slp-perm-13.c | 29 +++++++++++++++++++ .../gcc.target/aarch64/sve/slp-perm-13.c | 4 +++ gcc/tree-vect-slp.cc | 21 +++++++++++++- 3 files changed, 53 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-13.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-13.c b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c new file mode 100644 index 00000000000..08639e72fb0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -fdump-tree-slp2-details" } */ + +#define LOAD_VEC(e0, e1, e2, e3, p) \ + int e0 = p[0]; \ + int e1 = p[1]; \ + int e2 = p[2]; \ + int e3 = p[3]; + +#define STORE_VEC(p, e0, e1, e2, e3) \ + p[0] = e0; \ + p[1] = e1; \ + p[2] = e2; \ + p[3] = e3; + +void +foo (int *p) +{ + LOAD_VEC(s0, s1, s2, s3, p); + + int t0 = s0 + s1; + int t1 = s0 - s1; + int t2 = s2 + s3; + int t3 = s2 - s3; + + STORE_VEC(p, t0, t1, t2, t3); +} + +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 4, 2, 6 }" "slp2" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c new file mode 100644 index 00000000000..f5839f273e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c @@ -0,0 +1,4 @@ +#include "../../../gcc.dg/vect/slp-perm-13.c" + +/* { dg-final { scan-assembler-not {\ttbl\t} } } */ + diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 16332e0b6d7..8794c94ef90 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2921,7 +2921,26 @@ fail: gassign *ostmt = as_a <gassign *> (ostmt_info->stmt); if (gimple_assign_rhs_code (ostmt) != code0) { - SLP_TREE_LANE_PERMUTATION (node).safe_push (std::make_pair (1, i)); + /* If the current element can be found in another lane that has + been used previously then use that one instead. This can + happen when the ONE and TWO contain duplicate elements and + reduces the number of 'active' lanes. */ + int idx = i; + for (int alt_idx = (int) i - 1; alt_idx >= 0; alt_idx--) + { + gassign *alt_stmt = as_a <gassign *> (stmts[alt_idx]->stmt); + if (gimple_assign_rhs_code (alt_stmt) == code0 + && gimple_assign_rhs1 (ostmt) + == gimple_assign_rhs1 (alt_stmt) + && gimple_assign_rhs2 (ostmt) + == gimple_assign_rhs2 (alt_stmt)) + { + idx = alt_idx; + break; + } + } + SLP_TREE_LANE_PERMUTATION (node) + .safe_push (std::make_pair (1, idx)); ocode = gimple_assign_rhs_code (ostmt); j = i; } -- 2.46.0