Hi Christoph,

> -----Original Message-----
> From: Christoph Müllner <christoph.muell...@vrull.eu>
> Sent: Tuesday, October 15, 2024 3:57 PM
> To: gcc-patches@gcc.gnu.org; Philipp Tomsich <philipp.toms...@vrull.eu>; Tamar
> Christina <tamar.christ...@arm.com>; Richard Biener <rguent...@suse.de>
> Cc: Jeff Law <jeffreya...@gmail.com>; Robin Dapp <rd...@ventanamicro.com>;
> Christoph Müllner <christoph.muell...@vrull.eu>
> Subject: [PATCH 1/2] Reduce lane utilization in VEC_PERM_EXPRs for
> two_operator nodes
> 
> When two_operator SLP nodes are built, the VEC_PERM_EXPR that merges the
> result
> selects a lane only based on the operator found. If the input nodes have
> duplicate elements, there may be more than one way to choose. This commit
> changes the policy to reuse an existing lane with the result that we can
> possibly free up lanes entirely.
> 
> For example, given two vectors with duplicates:
>   A = {a1, a1, a2, a2}
>   B = {b1, b1, b3, b2}
> 
> A two_operator node with operators +, -, +, - is currently built as:
>   RES = VEC_PERM_EXPR<A, B>(0, 5, 2, 7)
> With this patch, the permutation becomes:
>   RES = VEC_PERM_EXPR<A, B>(0, 4, 2, 6)
> Lanes 0 and 2 are reused and lanes 1 and 3 are not utilized anymore.
> 
> The direct effect of this change can be seen in the AArch64 test case,
> where the simpler permutation allows to lower to a TRN1 instead of an
> expensive TBL.

That makes sense,  So this is trying to make the permutes EVEN/EVEN or
ODD/ODD if possible.  Just wondering if....

> 
> Bootstrapped and reg-tested on AArch64 (C, C++, Fortran).
> 
> Manolis Tsamis was the patch's initial author before I took it over.
> 
> gcc/ChangeLog:
> 
>       * tree-vect-slp.cc: Reduce lane utilization in VEC_PERM_EXPRs for
> two_operators
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/slp-perm-13.c: New test.
>       * gcc.target/aarch64/sve/slp-perm-13.c: New test.
> 
> Signed-off-by: Christoph Müllner <christoph.muell...@vrull.eu>
> ---
>  gcc/testsuite/gcc.dg/vect/slp-perm-13.c       | 29 +++++++++++++++++++
>  .../gcc.target/aarch64/sve/slp-perm-13.c      |  4 +++
>  gcc/tree-vect-slp.cc                          | 21 +++++++++++++-
>  3 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-perm-13.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> new file mode 100644
> index 00000000000..08639e72fb0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-perm-13.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3 -fdump-tree-slp2-details" } */
> +
> +#define LOAD_VEC(e0, e1, e2, e3, p) \
> +    int e0 = p[0]; \
> +    int e1 = p[1]; \
> +    int e2 = p[2]; \
> +    int e3 = p[3];
> +
> +#define STORE_VEC(p, e0, e1, e2, e3) \
> +    p[0] = e0; \
> +    p[1] = e1; \
> +    p[2] = e2; \
> +    p[3] = e3;
> +
> +void
> +foo (int *p)
> +{
> +  LOAD_VEC(s0, s1, s2, s3, p);
> +
> +  int t0 = s0 + s1;
> +  int t1 = s0 - s1;
> +  int t2 = s2 + s3;
> +  int t3 = s2 - s3;
> +
> +  STORE_VEC(p, t0, t1, t2, t3);
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 0, 4, 2, 6 }" "slp2" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> new file mode 100644
> index 00000000000..f5839f273e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp-perm-13.c
> @@ -0,0 +1,4 @@
> +#include "../../../gcc.dg/vect/slp-perm-13.c"
> +
> +/* { dg-final { scan-assembler-not {\ttbl\t} } } */
> +
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 16332e0b6d7..8794c94ef90 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -2921,7 +2921,26 @@ fail:
>         gassign *ostmt = as_a <gassign *> (ostmt_info->stmt);
>         if (gimple_assign_rhs_code (ostmt) != code0)
>           {
> -           SLP_TREE_LANE_PERMUTATION (node).safe_push (std::make_pair (1,
> i));
> +           /* If the current element can be found in another lane that has
> +              been used previously then use that one instead.  This can
> +              happen when the ONE and TWO contain duplicate elements and
> +              reduces the number of 'active' lanes.  */
> +           int idx = i;
> +           for (int alt_idx = (int) i - 1; alt_idx >= 0; alt_idx--)
> +             {
> +               gassign *alt_stmt = as_a <gassign *> (stmts[alt_idx]->stmt);
> +               if (gimple_assign_rhs_code (alt_stmt) == code0
> +                   && gimple_assign_rhs1 (ostmt)
> +                     == gimple_assign_rhs1 (alt_stmt)
> +                   && gimple_assign_rhs2 (ostmt)
> +                     == gimple_assign_rhs2 (alt_stmt))
> +                 {

.. we may want operand_equal_p (ostmt, alt_stmt, 0) etc here.

This looks good to me though but can't approve. 

Lets see what Richi thinks.

Cheers,
Tamar

> +                   idx = alt_idx;
> +                   break;
> +                 }
> +             }
> +           SLP_TREE_LANE_PERMUTATION (node)
> +             .safe_push (std::make_pair (1, idx));
>             ocode = gimple_assign_rhs_code (ostmt);
>             j = i;
>           }
> --
> 2.46.0

Reply via email to