Re: [PATCH v3 2/2] gimple-fold: extend vector simplification to match scalar bitwise optimizations [PR119196]

Icen Zeyada Wed, 28 May 2025 05:30:09 -0700

Hi Richard,
I've implemented some of your suggested changes, but I'm not entirely sure 
there's an elegant way to handle the second one:
> "So here you'd want to verify we can to LT_EXPR for the types involved, and 
> the cases which simplify to constant_boolean_node do not need any such check. 
> Possibly the same issue applies to the cases below; I did not verify."
Most of those expressions are selected from `code1` or `code2` in the switch 
statements, while the rest—like the example you mentioned—are their folded or 
simplified forms (e.g., `NE` and `LE` becoming `LT`). How can I determine those 
expressions at the start of the simplification?
Or are you suggesting that `expand_vec_cmp_expr_p` should be distributed within 
the functions—that is, inside the conditionals that decide which expression to 
return? So we would end up with something like:
```
(if (code1 == NE_EXPR
&& code2 == LE_EXPR
&& cmp == 0
&& (allbits
|| (VECTOR_BOOLEAN_TYPE_P (type)
&& expand_vec_cmp_expr_p (TREE_TYPE (@1), type, LT_EXPR))))
(lt @c0 (convert @1)))
```
...applied across all expressions?


Kind Regards,
Icen

________________________________
From: Richard Biener <rguent...@suse.de>
Sent: 27 May 2025 13:47
To: Icen Zeyada <icen.zeya...@arm.com>
Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; jeffreya...@gmail.com 
<jeffreya...@gmail.com>; i...@airs.com <i...@airs.com>; Richard Earnshaw 
<richard.earns...@arm.com>; pins...@gmail.com <pins...@gmail.com>; Victor Do 
Nascimento <victor.donascime...@arm.com>; Tamar Christina 
<tamar.christ...@arm.com>
Subject: Re: [PATCH v3 2/2] gimple-fold: extend vector simplification to match 
scalar bitwise optimizations [PR119196]

On Wed, 21 May 2025, Icen Zeyada wrote:

>     Generalize existing scalar gimple_fold rules to apply the same
>     bitwise comparison simplifications to vector types.  Previously, an
>     expression like
>
>         (x < y) && (x > y)
>
>     would fold to `false` if x and y are scalars, but equivalent vector
>     comparisons were left untouched.  This patch enables folding of
>     patterns of the form
>
>         (cmp x y) bit_and (cmp x y)
>         (cmp x y) bit_ior (cmp x y)
>         (cmp x y) bit_xor (cmp x y)
>
>     for vector operands as well, ensuring consistent optimization across
>     all data types.
>
>     PR tree-optimization/119196
>
>     gcc/ChangeLog:
>
>       * match.pd: Allow scalar optimizations with bitwise AND/OR/XOR to apply 
> to vectors.
>
>     gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/vector-compare-5.c: Add new test for vector 
> compare simplification.
>
> Signed-off-by: Icen Zeyada <icen.zeya...@arm.com>
> ---
>  gcc/match.pd                                  | 16 ++++-
>  .../gcc.target/aarch64/vector-compare-5.c     | 67 +++++++++++++++++++
>  2 files changed, 81 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vector-compare-5.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 611f05ef9f9c..7a7df6aeb6c5 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3635,6 +3635,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if ((TREE_CODE (@1) == INTEGER_CST
>         && TREE_CODE (@2) == INTEGER_CST)
>        || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   || (VECTOR_TYPE_P (TREE_TYPE (@1))

Note this does not verify we are doing a vector compare, our IL
allows vector ==/!= vector to scalar bool compares.  The appropriate
test should be VECTOR_BOOLEAN_TYPE_P (type) to check for
a vector compare (to gate an expand_vec_cmp_expr_p check)
and for the bitwise_equal_p guard your change looks OK.

> +   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))

The expand_vec_cmp_expr_p is misplaced - we generate not 'code2'
but a comparison code depending on it, like for

      (if (code1 == NE_EXPR
           && code2 == LE_EXPR
           && cmp == 0
           && allbits)
       (lt @c0 (convert @1)))

so here you'd want to verify we can to LT_EXPR for the types involved
and the cases which simplify to constant_boolean_node do not need
any such check.  Possibly the same issue applies to the cases below,
I did not verify.

Thanks,
Richard.

>             || POINTER_TYPE_P (TREE_TYPE (@1)))
>            && bitwise_equal_p (@1, @2)))
>      (with
> @@ -3712,6 +3714,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>    (if ((TREE_CODE (@1) == INTEGER_CST
>        && TREE_CODE (@2) == INTEGER_CST)
>         || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +       || (VECTOR_TYPE_P (TREE_TYPE (@1))
> +       && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
>            || POINTER_TYPE_P (TREE_TYPE (@1)))
>           && operand_equal_p (@1, @2)))
>     (with
> @@ -3762,6 +3766,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if ((TREE_CODE (@1) == INTEGER_CST
>         && TREE_CODE (@2) == INTEGER_CST)
>        || ((INTEGRAL_TYPE_P (TREE_TYPE (@1))
> +   || (VECTOR_TYPE_P (TREE_TYPE (@1))
> +   && expand_vec_cmp_expr_p (TREE_TYPE (@1), type, code2))
>            || POINTER_TYPE_P (TREE_TYPE (@1)))
>            && bitwise_equal_p (@1, @2)))
>      (with
> @@ -3885,7 +3891,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>        rcmp (ne le gt ne lt ge)
>    (simplify
>     (op:c (cmp1:c @0 @1) (cmp2 @0 @1))
> -   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +      || POINTER_TYPE_P (TREE_TYPE (@0))
> +      || (VECTOR_TYPE_P (TREE_TYPE (@1))
> +      && expand_vec_cmp_expr_p (TREE_TYPE (@0), type, rcmp)))
>      (rcmp @0 @1)))))
>
>  /* Optimize (a CMP b) == (a CMP b)  */
> @@ -3894,7 +3903,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>       rcmp (eq gt le eq ge lt)
>   (simplify
>    (eq:c (cmp1:c @0 @1) (cmp2 @0 @1))
> -  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +             || POINTER_TYPE_P (TREE_TYPE (@0))
> +      || (VECTOR_TYPE_P (TREE_TYPE (@0))
> +      && expand_vec_cmp_expr_p (TREE_TYPE (@0), type,  rcmp)))
>      (rcmp @0 @1))))
>
>  /* (type)([0,1]@a != 0) -> (type)a
> diff --git a/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c 
> b/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c
> new file mode 100644
> index 000000000000..a1a601dc1958
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vector-compare-5.c
> @@ -0,0 +1,67 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-original-all" } */
> +
> +typedef int v4i __attribute__((vector_size(4*sizeof(int))));
> +
> +/* Ensure we can simplify `VEC_COND_EXPR(a OP1 b) OP2 VEC_COND_EXPR(a OP3 b)`
> + * into `VEC_COND_EXPR(a OP4 b)`
> + */
> +
> +void use (v4i const *z);
> +
> +void
> +g (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x > *y | *x == *y; // expect >=
> +  *t = *x > *y | *x <= *y; // expect true
> +}
> +
> +void
> +h (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x <= *y & *x >= *y; // expect x == y
> +  *t = *x <= *y & *x != *y; // expect x<y
> +}
> +
> +void
> +i (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x == *y | *x != *y; // expect true
> +  *t = *x == *y & *x != *y; // expect false
> +}
> +
> +void
> +k (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x < *y | *x == *y;  // x <= y
> +  *t = *x < *y & *x > *y;   // expect false
> +}
> +
> +void
> +m (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x <= *y ^ *x >= *y; /* expect x != y */
> +  *t = *x <= *y ^ *x != *y; /* expect x <= y */
> +}
> +
> +void
> +n (v4i *x, v4i const *y, v4i *z, v4i *t)
> +{
> +  *z = *x == *y ^ *x != *y; /* expect true */
> +  *t = *x == *y ^ *x == *y; /* expect false */
> +}
> +
> +
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*>=\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*;" "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*==\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*<\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*;" "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*;" "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*<=\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*;" "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*!=\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*VEC_COND_EXPR\\s*<\\s*\\*xD\\.\\d+\\s*>=\\s*VIEW_CONVERT_EXPR<v4iD\\.\\d+>\\(\\*yD\\.\\d+\\)\\s*,\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*,\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*>\\s*;"
>  "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*zD\\.\\d+\\s*=\\s*\\{\\s*-1(,\\s*-1){3}\\s*\\}\\s*;" "original" } } */
> +/* { dg-final { scan-tree-dump 
> ".*\\*tD\\.\\d+\\s*=\\s*\\{\\s*0(,\\s*0){3}\\s*\\}\\s*;" "original" } } */
>

--
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v3 2/2] gimple-fold: extend vector simplification to match scalar bitwise optimizations [PR119196]

Reply via email to