On Fri, 22 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase shows wrong-code caused by incorrect use
> of with_possible_nonzero_bits2.
> That matcher is defined as
> /* Slightly extended version, do not make it recursive to keep it cheap.  */
> (match (with_possible_nonzero_bits2 @0)
>  with_possible_nonzero_bits@0)
> (match (with_possible_nonzero_bits2 @0)
>  (bit_and:c with_possible_nonzero_bits@0 @2))
> and because with_possible_nonzero_bits includes the SSA_NAME case with
> integral/pointer argument, both forms can actually match when a SSA_NAME
> with integral/pointer type has a def stmt which is BIT_AND_EXPR
> assignment with say SSA_NAME with integral/pointer type as one of its
> operands (or INTEGER_CST, another with_possible_nonzero_bits case).
> And in match.pd the latter actually wins if both match and so when using
> (with_possible_nonzero_bits2 @0) the @0 will actually be one of the
> BIT_AND_EXPR operands if that form is matched.

Hmm, the (match ...) cases should be ideally ordered in order of the
match.pd file, so we maybe should simply swap the two, making the
fallback to with_possible_nonzero_bits match last?

> Now, with_possible_nonzero_bits2 and with_certain_nonzero_bits2 were added
> for the
> /* X == C (or X & Z == Y | C) is impossible if ~nonzero(X) & C != 0.  */
> (for cmp (eq ne)
>  (simplify
>   (cmp:c (with_possible_nonzero_bits2 @0) (with_certain_nonzero_bits2 @1))
>   (if (wi::bit_and_not (wi::to_wide (@1), get_nonzero_bits (@0)) != 0)
>    { constant_boolean_node (cmp == NE_EXPR, type); })))         
> simplifier, but even for that one I think they do not do a good job, they
> might actually pessimize stuff rather than optimize, but at least does not
> result in wrong-code, because the operands are solely tested with
> wi::to_wide or get_nonzero_bits, but not actually used in the
> simplification.  The reason why it can pessimize stuff is say if we have
>   # RANGE [irange] int ... MASK 0xb VALUE 0x0
>   x_1 = ...;
>   # RANGE [irange] int ... MASK 0x8 VALUE 0x0
>   _2 = x_1 & 0xc;
>   _3 = _2 == 2;
> then if it used just with_possible_nonzero_bits@0, @0 would have
> get_nonzero_bits (@0) 0x8 and (2 & ~8) != 0, so we can fold it into
>   _3 = 0;
> But as it uses (with_possible_nonzero_bits2 @0), @0 is x_1 rather
> than _2 and get_nonzero_bits (@0) is unnecessarily conservative,
> 0xb rather than 0x8 and (2 & ~0xb) == 0, so we don't optimize.
> Now, with_possible_nonzero_bits2 can actually improve stuff as well in that
> pattern, if say value ranges aren't fully computed yet or the BIT_AND_EXPR
> assignment has been added later and the lhs doesn't have range computed yet,
> get_nonzero_range on the BIT_AND_EXPR lhs will be all bits set, while
> on the BIT_AND_EXPR operand might actually succeed.
> 
> I believe better would be to either modify get_nonzero_bits so that it
> special cases the SSA_NAME with BIT_AND_EXPR def_stmt (but one level
> deep only like with_possible_nonzero_bits2, no recursion), in that case
> return bitwise and of get_nonzero_bits (non-recursive) for the lhs and
> both operands, and possibly BIT_AND_EXPR itself e.g. for GENERIC
> matching during by returning bitwise and of both operands.
> Then with_possible_nonzero_bits2 could be needed for the GENERIC case,
> perhaps have the second match #if GENERIC, but changed so that the @N
> operand always is the whole thing rather than its operand which is
> error-prone.  Or add get_nonzero_bits wrapper with a different name
> which would do that.
> 
> with_certain_nonzero_bits2 could be changed similarly, these days
> we can test known non-zero bits rather than possible non-zero bits on
> SSA_NAMEs too, we record both mask and value, so possible nonzero bits
> (aka. get_nonzero_bits) is mask () | value (), while known nonzero bits
> is value () & ~mask (), with a new function (get_known_nonzero_bits
> or get_certain_nonzero_bits etc.) which handles that.
> 
> Anyway, the following patch doesn't do what I wrote above just yet,
> for that single pattern it is just a missed optimization.
> But the with_possible_nonzero_bits2 uses in the 3 new simplifiers are
> just completely incorrect, because they don't just use the @0 operand
> in get_nonzero_bits (pessimizing stuff if value ranges are fully computed),
> but also use it in the replacement, then they act as if the BIT_AND_EXPR
> wasn't there at all.
> While we could use (with_possible_nonzero_bits2@3 @0) and use
> get_nonzero_bits (@0) and use @3 in the replacement, that would still
> often be a pessimization, so I've just used with_possible_nonzero_bits@0.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> And what do you think about the above mentioned approach for the other
> with_possible_nonzero_bits2 using simplifier?

I find having both matches confusing most of the time, we definitely
lack better documentation (in comments) on how they are supposed to be
used.

Improving things is appreciated - if what you suggest works out I'm
fine with it.

Richard.

> 2024-11-22  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR tree-optimization/117420
>       * match.pd ((X >> C1) << (C1 + C2) -> X << C2,
>       (X >> C1) * (C2 << C1) -> X * C2, X / (1 << C) -> X /[ex] (1 << C)):
>       Use with_possible_nonzero_bits@0 rather than
>       (with_possible_nonzero_bits2 @0).
> 
>       * gcc.dg/torture/pr117420.c: New test.
> 
> --- gcc/match.pd.jj   2024-11-18 12:21:10.449236948 +0100
> +++ gcc/match.pd      2024-11-21 16:12:46.346009032 +0100
> @@ -4952,7 +4952,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>  #if GIMPLE
>  /* (X >> C1) << (C1 + C2) -> X << C2 if the low C1 bits of X are zero.  */
>  (simplify
> - (lshift (convert? (rshift (with_possible_nonzero_bits2 @0) INTEGER_CST@1))
> + (lshift (convert? (rshift with_possible_nonzero_bits@0 INTEGER_CST@1))
>           INTEGER_CST@2)
>   (if (INTEGRAL_TYPE_P (type)
>        && wi::ltu_p (wi::to_wide (@1), element_precision (type))
> @@ -4963,7 +4963,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>  
>  /* (X >> C1) * (C2 << C1) -> X * C2 if the low C1 bits of X are zero.  */
>  (simplify
> - (mult (convert? (rshift (with_possible_nonzero_bits2 @0) INTEGER_CST@1))
> + (mult (convert? (rshift with_possible_nonzero_bits@0 INTEGER_CST@1))
>         poly_int_tree_p@2)
>   (with { poly_widest_int factor; }
>    (if (INTEGRAL_TYPE_P (type)
> @@ -5538,7 +5538,7 @@ (define_operator_list SYNC_FETCH_AND_AND
>  #if GIMPLE
>  /* X / (1 << C) -> X /[ex] (1 << C) if the low C bits of X are clear.  */
>  (simplify
> - (trunc_div (with_possible_nonzero_bits2 @0) integer_pow2p@1)
> + (trunc_div with_possible_nonzero_bits@0 integer_pow2p@1)
>   (if (INTEGRAL_TYPE_P (type)
>        && !TYPE_UNSIGNED (type)
>        && wi::multiple_of_p (get_nonzero_bits (@0), wi::to_wide (@1), SIGNED))
> --- gcc/testsuite/gcc.dg/torture/pr117420.c.jj        2024-11-21 
> 16:11:09.525386201 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr117420.c   2024-11-21 16:20:16.441607350 
> +0100
> @@ -0,0 +1,52 @@
> +/* PR tree-optimization/117420 */
> +/* { dg-do run } */
> +
> +int a;
> +
> +__attribute__((noipa)) int
> +foo ()
> +{
> +  int b = -(1 | -(a < 1));
> +  int c = (~b & 2) / 2;
> +  if (b != 1)
> +    __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) int
> +bar ()
> +{
> +  int b = -(1 | -(a < 1));
> +  int c = (~b & 2) / 2;
> +  if (b != -1)
> +    __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) int
> +baz ()
> +{
> +  int b = -(1 | -(a < 1));
> +  int c = (~b & 2) / 2;
> +  if (c != 1)
> +    __builtin_abort ();
> +}
> +
> +__attribute__((noipa)) int
> +qux ()
> +{
> +  int b = -(1 | -(a < 1));
> +  int c = (~b & 2) / 2;
> +  if (c != 0)
> +    __builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  a = 1;
> +  bar ();
> +  a = 0;
> +  baz ();
> +  a = 1;
> +  qux ();
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to