On Wed, Jun 4, 2025 at 7:44 PM Andrew Pinski <pins...@gmail.com> wrote: > > On Wed, Jun 4, 2025 at 6:27 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Thu, May 29, 2025 at 10:04 AM <dhr...@nvidia.com> wrote: > > > > > > From: Dhruv Chawla <dhr...@nvidia.com> > > > > > > This patch folds the following patterns: > > > - max (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a > > > - max (a, sub (a, b)) -> [sum, ovf] = subo (a, b); !ovf ? a : sum > > > - min (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? a : sum > > > - min (a, sub (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a > > > > I wonder whether this is really beneficial without considering the > > target. IMO a COND_EXPR is always less "canonical", the original > > form should better optimize with surrounding code. > > This happens very late in the gimple optimization. > > > > > I suppose you are after improved code generation for aarch64? Can > > this not be achieved by RTL level simplification / instruction combining? > > So the RTL level combine gets us: > ``` > (set (reg:SI 105 [ _5 ]) > (umax:SI (plus:SI (reg/v:SI 103 [ a ]) > (reg:SI 108 [ b ])) > (reg/v:SI 103 [ a ]))) > ``` > the aarch64 backend could match this I suspect but it looks like this > transformation also helps x86 and other targets which don't have umax > patterns/obtab but have add with overflow optabs.
On x86 STV can decide to use SSE regs for this where min/max are available. Turning it into cmov on oflag would be premature. STV1 is before combine, STV2 after it. IMO the above shows it's perfect for a target specific combine? Richard. > In fact looking at the code gen between the two versions, with > aarch64's cssc, using umax might be better. > ``` > add w1, w0, w1 // c_3, a, b > umax w0, w1, w0 //, c_3, a > ``` > vs: > ``` > adds w8, w0, w1 > csel w0, w0, w8, hs > ``` > > Because we don't clobber CC/flags. > > > > > Richard. > > > > > Where ovf is the overflow flag, addo and subo are overflowing addition and > > > subtraction, respectively. The folded patterns can normally be > > > implemented as > > > an overflowing operation combined with a conditional move/select > > > instruction. > > > > > > Explanation for the conditions handled in arith_overflow_check_p: > > > > > > Case 1/2: r = a + b; max/min (a, r) or max/min (r, a) > > > lhs (r) > > > if crhs1 (a) and crhs2 (r) > > > => lhs (r) == crhs2 (r) && > > > (rhs1 (a or b) == crhs1 (a) || rhs2 (a or b) == crhs1 (a)) > > > if crhs1 (r) and crhs2 (a) > > > => lhs (r) == crhs1 (r) && > > > (rhs1 (a or b) == crhs2 (a) || rhs2 (a or b) == crhs2 (a)) > > > > > > Both rhs1 and rhs2 are checked in (rhs<n> == crhs<n>) as addition is > > > commutative. > > > > > > Case 3/4: r = a - b; max/min (a, r) or max/min (r, a) > > > lhs (r) > > > if crhs1 (a) and crhs2 (r) > > > => lhs (r) == crhs2 (r) && rhs1 (a) == crhs1 (a) > > > if crhs1 (r) and crhs2 (a) > > > => lhs (r) == crhs1 (r) && rhs1 (a) == crhs2 (a) > > > > > > Bootstrapped and regtested on aarch64-unknown-linux-gnu. > > > > > > Signed-off-by: Dhruv Chawla <dhr...@nvidia.com> > > > > > > gcc/ChangeLog: > > > > > > PR middle-end/116815 > > > * tree-ssa-math-opts.cc (arith_overflow_check_p): Match min/max > > > patterns. > > > (build_minmax_replacement_statements): New function. > > > (match_arith_overflow): Update to handle min/max patterns. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/tree-ssa/pr116815-1.c: New test. > > > * gcc.dg/tree-ssa/pr116815-2.c: Likewise. > > > * gcc.dg/tree-ssa/pr116815-3.c: Likewise. > > > --- > > > gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c | 42 ++++++ > > > gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c | 93 +++++++++++++ > > > gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c | 43 ++++++ > > > gcc/tree-ssa-math-opts.cc | 151 +++++++++++++++++++-- > > > 4 files changed, 318 insertions(+), 11 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c > > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c > > > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c > > > new file mode 100644 > > > index 00000000000..5d62843d63c > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c > > > @@ -0,0 +1,42 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > + > > > +/* PR middle-end/116815 */ > > > + > > > +/* Single-use tests. */ > > > + > > > +static inline unsigned > > > +max (unsigned a, unsigned b) > > > +{ > > > + return a > b ? a : b; > > > +} > > > + > > > +static inline unsigned > > > +min (unsigned a, unsigned b) > > > +{ > > > + return a < b ? a : b; > > > +} > > > + > > > +#define OPERATION(op, type, N, exp1, exp2) > > > \ > > > + unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, > > > exp2); } > > > + > > > +OPERATION (max, add, 1, a, a + b) > > > +OPERATION (max, add, 2, a, b + a) > > > +OPERATION (max, add, 3, a + b, a) > > > +OPERATION (max, add, 4, b + a, a) > > > + > > > +OPERATION (min, add, 1, a, a + b) > > > +OPERATION (min, add, 2, a, b + a) > > > +OPERATION (min, add, 3, a + b, a) > > > +OPERATION (min, add, 4, b + a, a) > > > + > > > +OPERATION (max, sub, 1, a, a - b) > > > +OPERATION (max, sub, 2, a - b, a) > > > + > > > +OPERATION (min, sub, 1, a, a - b) > > > +OPERATION (min, sub, 2, a - b, a) > > > + > > > +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 8 "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 4 "optimized" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c > > > new file mode 100644 > > > index 00000000000..56e8038ef82 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c > > > @@ -0,0 +1,93 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > + > > > +/* PR middle-end/116815 */ > > > + > > > +/* Negative tests. */ > > > + > > > +static inline int > > > +smax (int a, int b) > > > +{ > > > + return a > b ? a : b; > > > +} > > > + > > > +static inline int > > > +smin (int a, int b) > > > +{ > > > + return a < b ? a : b; > > > +} > > > + > > > +static inline unsigned > > > +umax (unsigned a, unsigned b) > > > +{ > > > + return a > b ? a : b; > > > +} > > > + > > > +static inline unsigned > > > +umin (unsigned a, unsigned b) > > > +{ > > > + return a < b ? a : b; > > > +} > > > + > > > +#define ASSUME(cond) if (!(cond)) __builtin_unreachable (); > > > + > > > +/* This transformation does not trigger on signed types. */ > > > + > > > +int > > > +smax_add (int a, int b) > > > +{ > > > + ASSUME (b >= 0); > > > + return smax (a, a + b); > > > +} > > > + > > > +int > > > +smin_add (int a, int b) > > > +{ > > > + ASSUME (b >= 0); > > > + return smin (a, a + b); > > > +} > > > + > > > +int > > > +smax_sub (int a, int b) > > > +{ > > > + ASSUME (b >= 0); > > > + return smax (a, a - b); > > > +} > > > + > > > +int > > > +smin_sub (int a, int b) > > > +{ > > > + ASSUME (b >= 0); > > > + return smin (a, a - b); > > > +} > > > + > > > +/* Invalid patterns. */ > > > + > > > +/* This can potentially be matched, but the RHS gets factored to > > > + (a + b) * b. */ > > > +unsigned > > > +umax_factored (unsigned a, unsigned b) > > > +{ > > > + return umax (a * b, a * b + b * b); > > > +} > > > + > > > +unsigned > > > +umin_mult (unsigned a, unsigned b) > > > +{ > > > + return umin (a, a * b); > > > +} > > > + > > > +unsigned > > > +umax_sub (unsigned a, unsigned b) > > > +{ > > > + return umax (a, b - a); > > > +} > > > + > > > +unsigned > > > +umin_sub (unsigned a, unsigned b) > > > +{ > > > + return umin (a, b - a); > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "ADD_OVERFLOW" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-not "SUB_OVERFLOW" "optimized" } } */ > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c > > > new file mode 100644 > > > index 00000000000..af1fe18d50a > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c > > > @@ -0,0 +1,43 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > + > > > +/* PR middle-end/116815 */ > > > + > > > +/* Multi-use tests. */ > > > + > > > +static inline unsigned > > > +max (unsigned a, unsigned b) > > > +{ > > > + return a > b ? a : b; > > > +} > > > + > > > +static inline unsigned > > > +min (unsigned a, unsigned b) > > > +{ > > > + return a < b ? a : b; > > > +} > > > + > > > +unsigned > > > +umax_add_umin_add (unsigned a, unsigned b) > > > +{ > > > + return max (a, a + b) + min (a + b, b); > > > +} > > > + > > > +unsigned > > > +umin_add_umax_add (unsigned a, unsigned b) > > > +{ > > > + return min (a, b + a) + max (b + a, b); > > > +} > > > + > > > +unsigned > > > +multiple_paths (unsigned a, unsigned b) > > > +{ > > > + if (a > 5) > > > + return max (a, a + b); > > > + else > > > + return min (a, a + b); > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */ > > > +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 3 "optimized" } } */ > > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > > > index 7e819f37446..f08cac68ca7 100644 > > > --- a/gcc/tree-ssa-math-opts.cc > > > +++ b/gcc/tree-ssa-math-opts.cc > > > @@ -3981,11 +3981,26 @@ arith_overflow_check_p (gimple *stmt, gimple > > > *cast_stmt, gimple *&use_stmt, > > > return 1; > > > } > > > > > > - if (TREE_CODE_CLASS (ccode) != tcc_comparison) > > > + if (TREE_CODE_CLASS (ccode) != tcc_comparison > > > + && TREE_CODE_CLASS (ccode) != tcc_binary) > > > return 0; > > > > > > switch (ccode) > > > { > > > + case MAX_EXPR: > > > + case MIN_EXPR: > > > + /* 1. r = a + b; max (a, r) or max (r, a) > > > + 2. r = a + b; min (a, r) or min (r, a) > > > + 3. r = a - b; max (a, r) or max (r, a) > > > + 4. r = a - b; min (a, r) or min (r, a) */ > > > + if ((code == PLUS_EXPR > > > + && ((lhs == crhs1 && (rhs1 == crhs2 || rhs2 == crhs2)) > > > + || (lhs == crhs2 && (rhs1 == crhs1 || rhs2 == crhs2)))) > > > + || (code == MINUS_EXPR > > > + && ((lhs == crhs1 && rhs1 == crhs2) > > > + || (lhs == crhs2 && rhs1 == crhs1)))) > > > + return 1; > > > + break; > > > case GT_EXPR: > > > case LE_EXPR: > > > if (maxval) > > > @@ -4339,6 +4354,73 @@ match_saturation_trunc (gimple_stmt_iterator *gsi, > > > gphi *phi) > > > return true; > > > } > > > > > > +/* Assume that ovf = overflow_flag (add/sub (...)). > > > + The replacement forms are: > > > + max (add) -> ovf ? a : a + b > > > + min (sub) -> ovf ? a : a - b > > > + max (sub) -> ovf ? a - b : a > > > + min (add) -> ovf ? a + b : a. */ > > > + > > > +static tree > > > +build_minmax_replacement_statements (gimple *def_stmt, tree ovf, tree > > > new_lhs, > > > + tree type, gimple *minmax_stmt) > > > +{ > > > + enum tree_code code = gimple_assign_rhs_code (def_stmt); > > > + enum tree_code rhs_code = gimple_assign_rhs_code (minmax_stmt); > > > + gcc_checking_assert (code == PLUS_EXPR || code == MINUS_EXPR); > > > + > > > + tree lhs = gimple_assign_lhs (def_stmt); > > > + tree rhs1 = gimple_assign_rhs1 (def_stmt); > > > + tree rhs2 = gimple_assign_rhs2 (def_stmt); > > > + > > > + tree use_rhs1 = gimple_assign_rhs1 (minmax_stmt); > > > + tree use_rhs2 = gimple_assign_rhs2 (minmax_stmt); > > > + > > > + /* First figure out which variable from def_stmt will be used in the > > > + COND_EXPR. */ > > > + tree minmax_var = NULL_TREE; > > > + /* Either max/min (a, add/sub (a, b)) or > > > + max/min (add/sub (a, b), a). */ > > > + if ((lhs == use_rhs2 && use_rhs1 == rhs1) > > > + || (lhs == use_rhs1 && use_rhs2 == rhs1)) > > > + minmax_var = rhs1; > > > + /* Either max/min (a, add (b, a)) or > > > + max/min (add (b, a), a). */ > > > + else if (code == PLUS_EXPR) > > > + minmax_var = rhs2; > > > + > > > + /* The above should always match rhs1 for MINUS_EXPR. */ > > > + gcc_checking_assert ( > > > + minmax_var != NULL_TREE > > > + && (code == PLUS_EXPR || (use_rhs1 != rhs2 && use_rhs2 != rhs2))); > > > + > > > + /* Figure out if we have to generate: > > > + (ovf != 0 ? new_lhs : minmax_var) or > > > + (ovf != 0 ? minmax_var : new_lhs) i.e. (ovf == 0 ? new_lhs : > > > minmax_var). > > > + The default case is assumed to be the first one. */ > > > + bool flip = false; > > > + if ((rhs_code == MIN_EXPR && code == PLUS_EXPR) > > > + || (rhs_code == MAX_EXPR && code == MINUS_EXPR)) > > > + flip = true; > > > + > > > + /* Generate the actual code. */ > > > + tree minmax = make_ssa_name (type); > > > + tree comparison_result = make_ssa_name (boolean_type_node); > > > + tree comparison_expr = build2 (flip ? EQ_EXPR : NE_EXPR, > > > boolean_type_node, > > > + ovf, build_int_cst (type, 0)); > > > + gimple *comparison_stmt > > > + = gimple_build_assign (comparison_result, comparison_expr); > > > + > > > + tree conditional > > > + = build3 (COND_EXPR, type, comparison_result, minmax_var, new_lhs); > > > + gimple *new_minmax_stmt = gimple_build_assign (minmax, conditional); > > > + gimple_stmt_iterator gsi = gsi_for_stmt (minmax_stmt); > > > + gsi_insert_before (&gsi, comparison_stmt, GSI_NEW_STMT); > > > + gsi_insert_after (&gsi, new_minmax_stmt, GSI_NEW_STMT); > > > + > > > + return minmax; > > > +} > > > + > > > /* Recognize for unsigned x > > > x = y - z; > > > if (x > y) > > > @@ -4391,7 +4473,19 @@ match_saturation_trunc (gimple_stmt_iterator *gsi, > > > gphi *phi) > > > z = IMAGPART_EXPR <_7>; > > > _8 = IMAGPART_EXPR <_7>; > > > _9 = _8 != 0; > > > - iftmp.0_3 = (int) _9; */ > > > + iftmp.0_3 = (int) _9; > > > + > > > + And also recognize: > > > + c = max/min (a, add/sub (a, b)) > > > + and replace it with > > > + _7 = .(ADD|SUB)_OVERFLOW (a, b); > > > + _8 = REALPART_EXPR <_7>; > > > + _9 = IMAGPART_EXPR <_7>; > > > + _10 = _9 != 0; (or _9 == 0) > > > + _11 = _10 ? _8 : a; > > > + c = _11; > > > + This can be optimized to a single conditional select instruction with > > > an > > > + overflowing arithmetic instruction. */ > > > > > > static bool > > > match_arith_overflow (gimple_stmt_iterator *gsi, gimple *stmt, > > > @@ -4425,6 +4519,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > > > > tree rhs1 = gimple_assign_rhs1 (stmt); > > > tree rhs2 = gimple_assign_rhs2 (stmt); > > > + bool minmax_use_seen = false; > > > FOR_EACH_IMM_USE_FAST (use_p, iter, lhs) > > > { > > > use_stmt = USE_STMT (use_p); > > > @@ -4445,6 +4540,13 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > return false; > > > cond_stmt = use_stmt; > > > } > > > + if (gimple_code (use_stmt) == GIMPLE_ASSIGN > > > + && gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) > > > + { > > > + tree_code rhs_code = gimple_assign_rhs_code (use_stmt); > > > + if (rhs_code == MAX_EXPR || rhs_code == MIN_EXPR) > > > + minmax_use_seen = true; > > > + } > > > ovf_use_seen = true; > > > } > > > else > > > @@ -4494,7 +4596,10 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > > > > tree maxval = NULL_TREE; > > > if (!ovf_use_seen > > > - || (code != MULT_EXPR && (code == BIT_NOT_EXPR ? use_seen : > > > !use_seen)) > > > + || (code != MULT_EXPR > > > + && (code == BIT_NOT_EXPR > > > + ? use_seen > > > + : !minmax_use_seen && !use_seen)) > > > || (code == PLUS_EXPR > > > && optab_handler (uaddv4_optab, > > > TYPE_MODE (type)) == CODE_FOR_nothing) > > > @@ -4758,6 +4863,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > gsi_insert_after (gsi, g2, GSI_NEW_STMT); > > > else > > > gsi_insert_before (gsi, g2, GSI_SAME_STMT); > > > + > > > if (code == MULT_EXPR) > > > mul_stmts.quick_push (g2); > > > > > > @@ -4786,15 +4892,25 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > if (gimple_code (use_stmt) == GIMPLE_COND) > > > { > > > gcond *cond_stmt = as_a <gcond *> (use_stmt); > > > - gimple_cond_set_lhs (cond_stmt, ovf); > > > - gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0)); > > > - gimple_cond_set_code (cond_stmt, ovf_use == 1 ? NE_EXPR : > > > EQ_EXPR); > > > + tree rhs = gimple_cond_rhs (cond_stmt); > > > + if (TREE_CODE (rhs) == MIN_EXPR || TREE_CODE (rhs) == MAX_EXPR) > > > + gimple_cond_set_rhs (cond_stmt, > > > + build_minmax_replacement_statements ( > > > + stmt, ovf, new_lhs, type, use_stmt)); > > > + else > > > + { > > > + gimple_cond_set_lhs (cond_stmt, ovf); > > > + gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0)); > > > + gimple_cond_set_code (cond_stmt, > > > + ovf_use == 1 ? NE_EXPR : EQ_EXPR); > > > + } > > > } > > > else > > > { > > > gcc_checking_assert (is_gimple_assign (use_stmt)); > > > if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS) > > > { > > > + tree_code rhs_code = gimple_assign_rhs_code (use_stmt); > > > if (gimple_assign_rhs_code (use_stmt) == RSHIFT_EXPR) > > > { > > > g2 = gimple_build_assign (make_ssa_name > > > (boolean_type_node), > > > @@ -4843,6 +4959,14 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > gsi_remove (&gsiu, true); > > > continue; > > > } > > > + else if (rhs_code == MIN_EXPR || rhs_code == MAX_EXPR) > > > + { > > > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > > > + gimple_assign_set_rhs_from_tree ( > > > + &gsi, > > > + build_minmax_replacement_statements (stmt, ovf, > > > new_lhs, > > > + type, use_stmt)); > > > + } > > > else > > > { > > > gimple_assign_set_rhs1 (use_stmt, ovf); > > > @@ -4854,11 +4978,16 @@ match_arith_overflow (gimple_stmt_iterator *gsi, > > > gimple *stmt, > > > } > > > else > > > { > > > - gcc_checking_assert (gimple_assign_rhs_code (use_stmt) > > > - == COND_EXPR); > > > - tree cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR, > > > - boolean_type_node, ovf, > > > - build_int_cst (type, 0)); > > > + tree_code rhs_code = gimple_assign_rhs_code (use_stmt); > > > + gcc_checking_assert (rhs_code == COND_EXPR || rhs_code == > > > MAX_EXPR > > > + || rhs_code == MIN_EXPR); > > > + tree cond = NULL_TREE; > > > + if (rhs_code != COND_EXPR) > > > + cond = build_minmax_replacement_statements (stmt, ovf, > > > new_lhs, > > > + type, > > > use_stmt); > > > + else > > > + cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR, > > > + boolean_type_node, ovf, build_int_cst > > > (type, 0)); > > > gimple_assign_set_rhs1 (use_stmt, cond); > > > } > > > } > > > -- > > > 2.44.0 > > >