sub(a, b))

Andrew Pinski Wed, 04 Jun 2025 10:45:16 -0700

On Wed, Jun 4, 2025 at 6:27 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Thu, May 29, 2025 at 10:04 AM <dhr...@nvidia.com> wrote:
> >
> > From: Dhruv Chawla <dhr...@nvidia.com>
> >
> > This patch folds the following patterns:
> > - max (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a
> > - max (a, sub (a, b)) -> [sum, ovf] = subo (a, b); !ovf ? a : sum
> > - min (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? a : sum
> > - min (a, sub (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a
>
> I wonder whether this is really beneficial without considering the
> target.  IMO a COND_EXPR is always less "canonical", the original
> form should better optimize with surrounding code.


This happens very late in the gimple optimization.

>
> I suppose you are after improved code generation for aarch64?  Can
> this not be achieved by RTL level simplification / instruction combining?

So the RTL level combine gets us:
```
(set (reg:SI 105 [ _5 ])
    (umax:SI (plus:SI (reg/v:SI 103 [ a ])
            (reg:SI 108 [ b ]))
        (reg/v:SI 103 [ a ])))
```
the aarch64 backend could match this I suspect but it looks like this
transformation also helps x86 and other targets which don't have umax
patterns/obtab but have add with overflow optabs.

In fact looking at the code gen between the two versions, with
aarch64's cssc, using umax might be better.
```
        add     w1, w0, w1        // c_3, a, b
        umax    w0, w1, w0      //, c_3, a
```
vs:
```
        adds    w8, w0, w1
        csel    w0, w0, w8, hs
```

Because we don't clobber CC/flags.

>
> Richard.
>
> > Where ovf is the overflow flag, addo and subo are overflowing addition and
> > subtraction, respectively. The folded patterns can normally be implemented 
> > as
> > an overflowing operation combined with a conditional move/select 
> > instruction.
> >
> > Explanation for the conditions handled in arith_overflow_check_p:
> >
> > Case 1/2: r = a + b; max/min (a, r) or max/min (r, a)
> >   lhs (r)
> >     if crhs1 (a) and crhs2 (r)
> >       => lhs (r) == crhs2 (r) &&
> >          (rhs1 (a or b) == crhs1 (a) || rhs2 (a or b) == crhs1 (a))
> >     if crhs1 (r) and crhs2 (a)
> >       => lhs (r) == crhs1 (r) &&
> >          (rhs1 (a or b) == crhs2 (a) || rhs2 (a or b) == crhs2 (a))
> >
> > Both rhs1 and rhs2 are checked in (rhs<n> == crhs<n>) as addition is
> > commutative.
> >
> > Case 3/4: r = a - b; max/min (a, r) or max/min (r, a)
> >   lhs (r)
> >     if crhs1 (a) and crhs2 (r)
> >       => lhs (r) == crhs2 (r) && rhs1 (a) == crhs1 (a)
> >     if crhs1 (r) and crhs2 (a)
> >       => lhs (r) == crhs1 (r) && rhs1 (a) == crhs2 (a)
> >
> > Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> >
> > Signed-off-by: Dhruv Chawla <dhr...@nvidia.com>
> >
> > gcc/ChangeLog:
> >
> >         PR middle-end/116815
> >         * tree-ssa-math-opts.cc (arith_overflow_check_p): Match min/max
> >         patterns.
> >         (build_minmax_replacement_statements): New function.
> >         (match_arith_overflow): Update to handle min/max patterns.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/tree-ssa/pr116815-1.c: New test.
> >         * gcc.dg/tree-ssa/pr116815-2.c: Likewise.
> >         * gcc.dg/tree-ssa/pr116815-3.c: Likewise.
> > ---
> >  gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c |  42 ++++++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c |  93 +++++++++++++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c |  43 ++++++
> >  gcc/tree-ssa-math-opts.cc                  | 151 +++++++++++++++++++--
> >  4 files changed, 318 insertions(+), 11 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> > new file mode 100644
> > index 00000000000..5d62843d63c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> > @@ -0,0 +1,42 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Single-use tests.  */
> > +
> > +static inline unsigned
> > +max (unsigned a, unsigned b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned
> > +min (unsigned a, unsigned b)
> > +{
> > +  return a < b ? a : b;
> > +}
> > +
> > +#define OPERATION(op, type, N, exp1, exp2)                                 
> >     \
> > +  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, 
> > exp2); }
> > +
> > +OPERATION (max, add, 1, a, a + b)
> > +OPERATION (max, add, 2, a, b + a)
> > +OPERATION (max, add, 3, a + b, a)
> > +OPERATION (max, add, 4, b + a, a)
> > +
> > +OPERATION (min, add, 1, a, a + b)
> > +OPERATION (min, add, 2, a, b + a)
> > +OPERATION (min, add, 3, a + b, a)
> > +OPERATION (min, add, 4, b + a, a)
> > +
> > +OPERATION (max, sub, 1, a, a - b)
> > +OPERATION (max, sub, 2, a - b, a)
> > +
> > +OPERATION (min, sub, 1, a, a - b)
> > +OPERATION (min, sub, 2, a - b, a)
> > +
> > +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 8 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 4 "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> > new file mode 100644
> > index 00000000000..56e8038ef82
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> > @@ -0,0 +1,93 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Negative tests.  */
> > +
> > +static inline int
> > +smax (int a, int b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> > +static inline int
> > +smin (int a, int b)
> > +{
> > +  return a < b ? a : b;
> > +}
> > +
> > +static inline unsigned
> > +umax (unsigned a, unsigned b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned
> > +umin (unsigned a, unsigned b)
> > +{
> > +  return a < b ? a : b;
> > +}
> > +
> > +#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
> > +
> > +/* This transformation does not trigger on signed types.  */
> > +
> > +int
> > +smax_add (int a, int b)
> > +{
> > +  ASSUME (b >= 0);
> > +  return smax (a, a + b);
> > +}
> > +
> > +int
> > +smin_add (int a, int b)
> > +{
> > +  ASSUME (b >= 0);
> > +  return smin (a, a + b);
> > +}
> > +
> > +int
> > +smax_sub (int a, int b)
> > +{
> > +  ASSUME (b >= 0);
> > +  return smax (a, a - b);
> > +}
> > +
> > +int
> > +smin_sub (int a, int b)
> > +{
> > +  ASSUME (b >= 0);
> > +  return smin (a, a - b);
> > +}
> > +
> > +/* Invalid patterns.  */
> > +
> > +/* This can potentially be matched, but the RHS gets factored to
> > +   (a + b) * b.  */
> > +unsigned
> > +umax_factored (unsigned a, unsigned b)
> > +{
> > +  return umax (a * b, a * b + b * b);
> > +}
> > +
> > +unsigned
> > +umin_mult (unsigned a, unsigned b)
> > +{
> > +  return umin (a, a * b);
> > +}
> > +
> > +unsigned
> > +umax_sub (unsigned a, unsigned b)
> > +{
> > +  return umax (a, b - a);
> > +}
> > +
> > +unsigned
> > +umin_sub (unsigned a, unsigned b)
> > +{
> > +  return umin (a, b - a);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "ADD_OVERFLOW" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not "SUB_OVERFLOW" "optimized" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> > new file mode 100644
> > index 00000000000..af1fe18d50a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> > @@ -0,0 +1,43 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Multi-use tests.  */
> > +
> > +static inline unsigned
> > +max (unsigned a, unsigned b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned
> > +min (unsigned a, unsigned b)
> > +{
> > +  return a < b ? a : b;
> > +}
> > +
> > +unsigned
> > +umax_add_umin_add (unsigned a, unsigned b)
> > +{
> > +  return max (a, a + b) + min (a + b, b);
> > +}
> > +
> > +unsigned
> > +umin_add_umax_add (unsigned a, unsigned b)
> > +{
> > +  return min (a, b + a) + max (b + a, b);
> > +}
> > +
> > +unsigned
> > +multiple_paths (unsigned a, unsigned b)
> > +{
> > +  if (a > 5)
> > +    return max (a, a + b);
> > +  else
> > +    return min (a, a + b);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 3 "optimized" } } */
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index 7e819f37446..f08cac68ca7 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -3981,11 +3981,26 @@ arith_overflow_check_p (gimple *stmt, gimple 
> > *cast_stmt, gimple *&use_stmt,
> >        return 1;
> >      }
> >
> > -  if (TREE_CODE_CLASS (ccode) != tcc_comparison)
> > +  if (TREE_CODE_CLASS (ccode) != tcc_comparison
> > +      && TREE_CODE_CLASS (ccode) != tcc_binary)
> >      return 0;
> >
> >    switch (ccode)
> >      {
> > +    case MAX_EXPR:
> > +    case MIN_EXPR:
> > +      /* 1. r = a + b; max (a, r) or max (r, a)
> > +        2. r = a + b; min (a, r) or min (r, a)
> > +        3. r = a - b; max (a, r) or max (r, a)
> > +        4. r = a - b; min (a, r) or min (r, a)  */
> > +      if ((code == PLUS_EXPR
> > +          && ((lhs == crhs1 && (rhs1 == crhs2 || rhs2 == crhs2))
> > +              || (lhs == crhs2 && (rhs1 == crhs1 || rhs2 == crhs2))))
> > +         || (code == MINUS_EXPR
> > +             && ((lhs == crhs1 && rhs1 == crhs2)
> > +                 || (lhs == crhs2 && rhs1 == crhs1))))
> > +       return 1;
> > +      break;
> >      case GT_EXPR:
> >      case LE_EXPR:
> >        if (maxval)
> > @@ -4339,6 +4354,73 @@ match_saturation_trunc (gimple_stmt_iterator *gsi, 
> > gphi *phi)
> >    return true;
> >  }
> >
> > +/* Assume that ovf = overflow_flag (add/sub (...)).
> > +   The replacement forms are:
> > +     max (add) -> ovf ? a : a + b
> > +     min (sub) -> ovf ? a : a - b
> > +     max (sub) -> ovf ? a - b : a
> > +     min (add) -> ovf ? a + b : a.  */
> > +
> > +static tree
> > +build_minmax_replacement_statements (gimple *def_stmt, tree ovf, tree 
> > new_lhs,
> > +                                    tree type, gimple *minmax_stmt)
> > +{
> > +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> > +  enum tree_code rhs_code = gimple_assign_rhs_code (minmax_stmt);
> > +  gcc_checking_assert (code == PLUS_EXPR || code == MINUS_EXPR);
> > +
> > +  tree lhs = gimple_assign_lhs (def_stmt);
> > +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> > +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> > +
> > +  tree use_rhs1 = gimple_assign_rhs1 (minmax_stmt);
> > +  tree use_rhs2 = gimple_assign_rhs2 (minmax_stmt);
> > +
> > +  /* First figure out which variable from def_stmt will be used in the
> > +     COND_EXPR.  */
> > +  tree minmax_var = NULL_TREE;
> > +  /* Either max/min (a, add/sub (a, b)) or
> > +           max/min (add/sub (a, b), a).  */
> > +  if ((lhs == use_rhs2 && use_rhs1 == rhs1)
> > +      || (lhs == use_rhs1 && use_rhs2 == rhs1))
> > +    minmax_var = rhs1;
> > +  /* Either max/min (a, add (b, a)) or
> > +           max/min (add (b, a), a).  */
> > +  else if (code == PLUS_EXPR)
> > +    minmax_var = rhs2;
> > +
> > +  /* The above should always match rhs1 for MINUS_EXPR.  */
> > +  gcc_checking_assert (
> > +    minmax_var != NULL_TREE
> > +    && (code == PLUS_EXPR || (use_rhs1 != rhs2 && use_rhs2 != rhs2)));
> > +
> > +  /* Figure out if we have to generate:
> > +       (ovf != 0 ? new_lhs : minmax_var) or
> > +       (ovf != 0 ? minmax_var : new_lhs) i.e. (ovf == 0 ? new_lhs : 
> > minmax_var).
> > +     The default case is assumed to be the first one.  */
> > +  bool flip = false;
> > +  if ((rhs_code == MIN_EXPR && code == PLUS_EXPR)
> > +      || (rhs_code == MAX_EXPR && code == MINUS_EXPR))
> > +    flip = true;
> > +
> > +  /* Generate the actual code.  */
> > +  tree minmax = make_ssa_name (type);
> > +  tree comparison_result = make_ssa_name (boolean_type_node);
> > +  tree comparison_expr = build2 (flip ? EQ_EXPR : NE_EXPR, 
> > boolean_type_node,
> > +                                ovf, build_int_cst (type, 0));
> > +  gimple *comparison_stmt
> > +    = gimple_build_assign (comparison_result, comparison_expr);
> > +
> > +  tree conditional
> > +    = build3 (COND_EXPR, type, comparison_result, minmax_var, new_lhs);
> > +  gimple *new_minmax_stmt = gimple_build_assign (minmax, conditional);
> > +  gimple_stmt_iterator gsi = gsi_for_stmt (minmax_stmt);
> > +  gsi_insert_before (&gsi, comparison_stmt, GSI_NEW_STMT);
> > +  gsi_insert_after (&gsi, new_minmax_stmt, GSI_NEW_STMT);
> > +
> > +  return minmax;
> > +}
> > +
> >  /* Recognize for unsigned x
> >     x = y - z;
> >     if (x > y)
> > @@ -4391,7 +4473,19 @@ match_saturation_trunc (gimple_stmt_iterator *gsi, 
> > gphi *phi)
> >     z = IMAGPART_EXPR <_7>;
> >     _8 = IMAGPART_EXPR <_7>;
> >     _9 = _8 != 0;
> > -   iftmp.0_3 = (int) _9;  */
> > +   iftmp.0_3 = (int) _9;
> > +
> > +   And also recognize:
> > +   c = max/min (a, add/sub (a, b))
> > +   and replace it with
> > +   _7 = .(ADD|SUB)_OVERFLOW (a, b);
> > +   _8 = REALPART_EXPR <_7>;
> > +   _9 = IMAGPART_EXPR <_7>;
> > +   _10 = _9 != 0; (or _9 == 0)
> > +   _11 = _10 ? _8 : a;
> > +   c = _11;
> > +   This can be optimized to a single conditional select instruction with an
> > +   overflowing arithmetic instruction.  */
> >
> >  static bool
> >  match_arith_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
> > @@ -4425,6 +4519,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >
> >    tree rhs1 = gimple_assign_rhs1 (stmt);
> >    tree rhs2 = gimple_assign_rhs2 (stmt);
> > +  bool minmax_use_seen = false;
> >    FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> >      {
> >        use_stmt = USE_STMT (use_p);
> > @@ -4445,6 +4540,13 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >                 return false;
> >               cond_stmt = use_stmt;
> >             }
> > +         if (gimple_code (use_stmt) == GIMPLE_ASSIGN
> > +             && gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
> > +           {
> > +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> > +             if (rhs_code == MAX_EXPR || rhs_code == MIN_EXPR)
> > +               minmax_use_seen = true;
> > +           }
> >           ovf_use_seen = true;
> >         }
> >        else
> > @@ -4494,7 +4596,10 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >
> >    tree maxval = NULL_TREE;
> >    if (!ovf_use_seen
> > -      || (code != MULT_EXPR && (code == BIT_NOT_EXPR ? use_seen : 
> > !use_seen))
> > +      || (code != MULT_EXPR
> > +         && (code == BIT_NOT_EXPR
> > +               ? use_seen
> > +               : !minmax_use_seen && !use_seen))
> >        || (code == PLUS_EXPR
> >           && optab_handler (uaddv4_optab,
> >                             TYPE_MODE (type)) == CODE_FOR_nothing)
> > @@ -4758,6 +4863,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >      gsi_insert_after (gsi, g2, GSI_NEW_STMT);
> >    else
> >      gsi_insert_before (gsi, g2, GSI_SAME_STMT);
> > +
> >    if (code == MULT_EXPR)
> >      mul_stmts.quick_push (g2);
> >
> > @@ -4786,15 +4892,25 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >        if (gimple_code (use_stmt) == GIMPLE_COND)
> >         {
> >           gcond *cond_stmt = as_a <gcond *> (use_stmt);
> > -         gimple_cond_set_lhs (cond_stmt, ovf);
> > -         gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0));
> > -         gimple_cond_set_code (cond_stmt, ovf_use == 1 ? NE_EXPR : 
> > EQ_EXPR);
> > +         tree rhs = gimple_cond_rhs (cond_stmt);
> > +         if (TREE_CODE (rhs) == MIN_EXPR || TREE_CODE (rhs) == MAX_EXPR)
> > +           gimple_cond_set_rhs (cond_stmt,
> > +                                build_minmax_replacement_statements (
> > +                                  stmt, ovf, new_lhs, type, use_stmt));
> > +         else
> > +           {
> > +             gimple_cond_set_lhs (cond_stmt, ovf);
> > +             gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0));
> > +             gimple_cond_set_code (cond_stmt,
> > +                                   ovf_use == 1 ? NE_EXPR : EQ_EXPR);
> > +           }
> >         }
> >        else
> >         {
> >           gcc_checking_assert (is_gimple_assign (use_stmt));
> >           if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
> >             {
> > +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> >               if (gimple_assign_rhs_code (use_stmt) == RSHIFT_EXPR)
> >                 {
> >                   g2 = gimple_build_assign (make_ssa_name 
> > (boolean_type_node),
> > @@ -4843,6 +4959,14 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >                   gsi_remove (&gsiu, true);
> >                   continue;
> >                 }
> > +             else if (rhs_code == MIN_EXPR || rhs_code == MAX_EXPR)
> > +               {
> > +                 gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> > +                 gimple_assign_set_rhs_from_tree (
> > +                   &gsi,
> > +                   build_minmax_replacement_statements (stmt, ovf, new_lhs,
> > +                                                        type, use_stmt));
> > +               }
> >               else
> >                 {
> >                   gimple_assign_set_rhs1 (use_stmt, ovf);
> > @@ -4854,11 +4978,16 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> > gimple *stmt,
> >             }
> >           else
> >             {
> > -             gcc_checking_assert (gimple_assign_rhs_code (use_stmt)
> > -                                  == COND_EXPR);
> > -             tree cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> > -                                 boolean_type_node, ovf,
> > -                                 build_int_cst (type, 0));
> > +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> > +             gcc_checking_assert (rhs_code == COND_EXPR || rhs_code == 
> > MAX_EXPR
> > +                                  || rhs_code == MIN_EXPR);
> > +             tree cond = NULL_TREE;
> > +             if (rhs_code != COND_EXPR)
> > +               cond = build_minmax_replacement_statements (stmt, ovf, 
> > new_lhs,
> > +                                                           type, use_stmt);
> > +             else
> > +               cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> > +                              boolean_type_node, ovf, build_int_cst (type, 
> > 0));
> >               gimple_assign_set_rhs1 (use_stmt, cond);
> >             }
> >         }
> > --
> > 2.44.0
> >

Re: [PATCH] widening_mul: Make better use of overflowing operations in codegen of min/max(a, add/sub(a, b))

Reply via email to