http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57343
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rakdver at gcc dot gnu.org --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- The value C is equal to final - base, where final and base are the final and initial value of the actual induction variable in the analysed loop. BNDS bounds the value of this difference when computed in signed type with unbounded range, while the computation of C is performed in an unsigned type with the range matching the range of the type of the induction variable. In particular, BNDS.up contains an upper bound on C in the following cases: -- if the iv must reach its final value without overflow, i.e., if NO_OVERFLOW && EXIT_MUST_BE_TAKEN is true, or -- if final >= base, which we know to hold when BNDS.below >= 0. */ static void number_of_iterations_ne_max (mpz_t bnd, bool no_overflow, tree c, tree s, bounds *bnds, bool exit_must_be_taken) { double_int max; mpz_t d; bool bnds_u_valid = ((no_overflow && exit_must_be_taken) || mpz_sgn (bnds->below) >= 0); for the case in question (IV of unsigned char type, base ((unsigned char) (c_lsm.4_12 + -1)) * 100 and step 100 (negated 156)) the bnds values (below 0, up 255 - full range of unsigned char) the fact that the IV overflows cannot be simply ignored - and I don't see how we can derive from below >= 0 that final (which is zero!) is >= base. We then fall to if (multiple_of_p (TREE_TYPE (c), c, s)) { /* If C is an exact multiple of S, then its value will be reached before the induction variable overflows (unless the loop is exited in some other way before). Note that the actual induction variable in the loop (which ranges from base to final instead of from 0 to C) may overflow, in which case BNDS.up will not be giving a correct upper bound on C; thus, BNDS_U_VALID had to be computed in advance. */ no_overflow = true; exit_must_be_taken = true; which ends up with no_overflow = true and things going downhill from here. Note that the original, non-negated bnds would _not_ have mpz_sgn (bnds->below) >= 0 (but -255): /* Rearrange the terms so that we get inequality S * i <> C, with S positive. Also cast everything to the unsigned type. If IV does not overflow, BNDS bounds the value of C. Also, this is the case if the computation |FINAL - IV->base| does not overflow, i.e., if BNDS->below in the result is nonnegative. */ if (tree_int_cst_sign_bit (iv->step)) { s = fold_convert (niter_type, fold_build1 (NEGATE_EXPR, type, iv->step)); c = fold_build2 (MINUS_EXPR, niter_type, fold_convert (niter_type, iv->base), fold_convert (niter_type, final)); bounds_negate (bnds); Here |FINAL - IV->base| does overflow, but bounds_negate shadows this. But even before that we have IVCANON remove the pointless exit Removed pointless exit: if (i_11 <= 4) Added canonical iv to loop 1, 3 iterations. then it adds back the canonical IV, but that's pointless in the next analysis run done by cunroll and removed there. The bogus upper bound is determined from local_pure_const via finite_loop_p () on the non-header-copied form and the estimate from the h == 0 test. Analyzing # of iterations of loop 1 exit condition [(unsigned char) (c_6 + -1) * 100, + , 156] != 0 bounds on difference of bases: -255 ... 0 result: # of iterations (((unsigned char) (c_6 + -1) * 100) /[ex] 4) * 41 & 63, bounded by 2 In the end I think that number_of_iterations_ne_max does not properly honor the case where for if (multiple_of_p (TREE_TYPE (c), c, s)) { /* If C is an exact multiple of S, then its value will be reached before the induction variable overflows (unless the loop is exited in some other way before). Note that the actual induction variable in the loop (which ranges from base to final instead of from 0 to C) may overflow, in which case BNDS.up will not be giving a correct upper bound on C; thus, BNDS_U_VALID had to be computed in advance. */ no_overflow = true; that multiply overflows. At least the IV does not range from 0 to C for (unsigned char) (c_6 + -1) * 100. I have a hard time following the logic that this all is correct for wrapping IVs ... that said, Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 199137) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -627,7 +627,7 @@ number_of_iterations_ne (tree type, affi not overflow, BNDS bounds the value of C. Also, this is the case if the computation |FINAL - IV->base| does not overflow, i.e., if BNDS->below in the result is nonnegative. */ - if (tree_int_cst_sign_bit (iv->step)) + if (tree_int_cst_sgn (iv->step) == -1) { s = fold_convert (niter_type, fold_build1 (NEGATE_EXPR, type, iv->step)); fixes the testcase (otherwise untested).