Not really my area, but FWIW...

Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> Hi,
> The attached patch tries to fix PR91166.
> Does it look OK ?
> Bootstrap+test in progress on aarch64-linux-gnu and x86_64-unknown-linux-gnu.
>
> Thanks,
> Prathamesh
>
> 2019-07-17  Prathamesh Kulkarni  <prathamesh.kulka...@linaro.org>
>
>       PR middle-end/91166
>       * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
>       (define_predicates): Add entry for uniform_vector_p.
>
> testsuite/
>       * gcc.target/aarch64/sve/pr91166.c: New test.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4a7aa0185d8..2ad98c28fd8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
>     integer_valued_real_p
>     integer_pow2p
>     uniform_integer_cst_p
> -   HONOR_NANS)
> +   HONOR_NANS
> +   uniform_vector_p)
>  
>  /* Operator lists.  */
>  (define_operator_list tcc_comparison
> @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>           { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))); })
>         (if (changed)
>          (vec_perm { op0; } { op1; } { op2; }))))))))))
> +
> +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
> +(simplify
> + (vec_perm (vec_duplicate@0 @1) @0 @2)
> + { @0; })
> +
> +(simplify
> + (vec_perm uniform_vector_p@0 @0 @1)
> + { @0; }) 

No need for the curly braces here, can use "@0" as the target of
the simplification.

It'd probably be worth using (match ...) to define a new predicate
that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
calling into uniform_vector_p for the latter two.

Thanks,
Richard

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> new file mode 100644
> index 00000000000..42654be3b31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> +
> +void
> +f1 (double x[][4]) 
> +{
> +  for (int i = 0; i < 4; ++i)
> +    for (int j = 0; j < 4; ++j)
> +      x[i][j] = 0;
> +}
> +
> +void
> +f2 (double x[][4], double y)
> +{
> +  for (int i = 0; i < 4; ++i)
> +    for (int j = 0; j < 4; ++j)
> +      x[i][j] = y;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */

Reply via email to