On Thu, Mar 17, 2016 at 5:31 PM, Robin Dapp <rd...@linux.vnet.ibm.com> wrote: > The attached patch is a first and somewhat hideous attempt to fix the > missed optimization discussed here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526 > > where (paraphrasing) a spurious > n = n - 1 > n = (sizetype) n > n = n + 1 > is being created. > > The patch tries to avoid this on tree level as well as on RTL level. > Both parts have to be addressed since the logic is essentially doubled > and e.g. loop-iv.c performs the same actions as tree-ssa-loop-niter.c to > obtain the number of iterations of a loop. > > (On a side note, I find the name 'niter' rather dubious, as the variable > actually represents the number of backjumps/latch executions and not the > number of iterations in a normal sense. This put me off track for quite > a while.) > > The n - 1 above is the number of "iterations"/backjumps. For loops where > the induction variable is incremented before the comparison, the number > of iterations has to be incremented as well (n + 1) to obtain the > correct number of backjumps. > > Both operations (-1, +1) are separate from each other and don't know if > the other operation was performed or not. Therefore, the rationale of > this patch to detect the situation where the number of iterations is of > the form n - 1 and still must be incremented afterwards. In such a case, > the - 1 will be stripped and a flag is set to avoid performing the + 1 > later on. > > A major abomination in the patch is that in order to decide whether to > act or not in loop-doloop.c, loop-iv.c has to have prepared an > unmodified expression before which is why find_simple_exit() is actually > called a second time with a flag that prevents the - 1. > > I am aware that it is neither optimal nor minimal hence I'd be grateful > for ideas on how to solve this in a more elegant way. I found no > regressions on x86-64 and s390x but did not yet perform bootstrapping > and more testing due to the premature nature of the patch.
As explained in the PR VRP has enough information to fix the IL. Other than that remembering both the number of exit tests and the number of latch executions might be a good idea. Personally I wouldn't worry about RTL level at all but instead do that on the GIMPLE level only. The vectorizer already does sth similar to avoid running into some degenerate case where the number of exit test invocations is zero (because # of latch executions + 1 overflows). See PR59058. Richard. > Thanks > Robin > > > gcc/ChangeLog: > > 2016-03-17 Robin Dapp <rd...@linux.vnet.ibm.com> > > * cfgloop.h (struct GTY): Add second number of iterations > * loop-doloop.c (doloop_condition_get): Fix whitespace > (doloop_modify): Use second number of iterations > * loop-iv.c (iv_number_of_iterations): Add parameter > (check_simple_exit): Add parameter > (find_simple_exit): Add parameter > (get_simple_loop_desc): Call find_simple_exit() a second time > * loop-unroll.c (unroll_loop_constant_iterations): Copy niter_expr > (unroll_loop_runtime_iterations): Copy niter_expr > * tree-ssa-loop-ivopts.c (cand_value_at): Avoid creating n + 1 - 1 > (may_eliminate_iv): Call with may_be_zero > (remove_unused_ivs): Fix whitespace