https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Roger Sayle <sa...@gcc.gnu.org>: https://gcc.gnu.org/g:de3e5aae6c4b540e808c822c1e878b0a3304d09c commit r12-5699-gde3e5aae6c4b540e808c822c1e878b0a3304d09c Author: Roger Sayle <ro...@nextmovesoftware.com> Date: Wed Dec 1 19:58:40 2021 +0000 Final value replacement improvements for until-wrap loops. This middle-end patch is inspired by the Richard Beiner's until-wrap loop example in PR tree-optimization/101145. unsigned foo(unsigned val, unsigned start) { unsigned cnt = 0; for (unsigned i = start; i > val; ++i) cnt++; return cnt; } For this loop, the tree optimizers currently generate: unsigned int foo (unsigned int val, unsigned int start) { unsigned int cnt; unsigned int _1; unsigned int _5; <bb 2> [local count: 118111600]: if (start_3(D) > val_4(D)) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 105119324]: _1 = start_3(D) + 1; _5 = -start_3(D); cnt_2 = _1 > val_4(D) ? _5 : 1; <bb 4> [local count: 118111600]: # cnt_11 = PHI <cnt_2(3), 0(2)> return cnt_11; } or perhaps slightly easier to read: if (start > val) { cnt = (start+1) > val ? -start : 1; } else cnt = 0; In this snippet, if we know start > val, then (start+1) > val unless start+1 overflows, i.e. (start+1) == 0 and start == ~0. We can use this (loop header) context to simplify the ternary expression to "(start != -1) ? -start : 1", which with a little help from match.pd can be folded to -start. Hence the optimal final value replacement should be: cnt = (start > val) ? -start : 0; Or as now generated by this patch: unsigned int foo (unsigned int val, unsigned int start) { unsigned int cnt; <bb 2> [local count: 118111600]: if (start_3(D) > val_4(D)) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 105119324]: cnt_2 = -start_3(D); <bb 4> [local count: 118111600]: # cnt_11 = PHI <cnt_2(3), 0(2)> return cnt_11; } We can also improve until-wrap loops that don't have a (suitable) loop header, as determined by simplify_using_initial_conditions. unsigned bar(unsigned val, unsigned start) { unsigned cnt = 0; unsigned i = start; do { cnt++; i++; } while (i > val); return cnt; } which is currently optimized to: unsigned int foo (unsigned int val, unsigned int start) { unsigned int cnt; unsigned int _9; unsigned int _10; <bb 2> [local count: 118111600]: _9 = start_4(D) + 1; _10 = -start_4(D); cnt_3 = val_7(D) < _9 ? _10 : 1; return cnt_3; } Here we have "val < (start+1) ? -start : 1", which again with the help of match.pd can be slightly simplified to "val <= start ? -start : 1" when dealing with unsigned types, because at the complicating value where start == ~0, we fortunately have -start == 1, hence it doesn't matter whether the second or third operand of the ternary operator is returned. To summarize, this patch (in addition to tweaking may_be_zero in number_of_iterations_until_wrap) adds three new constant folding transforms to match.pd. X != C1 ? -X : C2 simplifies to -X when -C1 == C2. which is the generalized form of the simplification above. X != C1 ? ~X : C2 simplifies to ~X when ~C1 == C2. which is the BIT_NOT_EXPR analog of the NEGATE_EXPR case. and the "until-wrap final value replacement without context": (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when X is unsigned, as when X + 1 overflows, X is -1, so -X == 1. 2021-12-01 Roger Sayle <ro...@nextmovesoftware.com> Richard Biener <rguent...@suse.de> gcc/ChangeLog * tree-ssa-loop-niter.c (number_of_iterations_until_wrap): Check if simplify_using_initial_conditions allows us to simplify the expression for may_be_zero. * match.pd (X != C ? -X : -C -> -X): New transform. (X != C ? ~X : ~C -> ~X): Likewise. ((X+1) > Y ? -X : 1 -> X >= Y ? -X : 1): Likewise. gcc/testsuite/ChangeLog * gcc.dg/fold-condneg-1.c: New test case. * gcc.dg/fold-condneg-2.c: New test case. * gcc.dg/fold-condnot-1.c: New test case. * gcc.dg/pr101145-1.c: New test case. * gcc.dg/pr101145-2.c: New test case.