On Fri, Nov 27, 2015 at 8:51 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Nov 27, 2015 at 12:44 PM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> This patch is to fix PR68529. In my previous scev/niter overflow patches, I >> only computed no-overflow information for control iv in simple loops with >> LT_EXPR as exit condition code. This bug is about loop with NE_EXPR as exit >> condition code. Given below example: >> >> #include <stdio.h> >> #include <stdlib.h> >> >> int main(){ >> char c[10000]={}; >> unsigned int nchar=9999; >> >> while(nchar--!=0){ >> c[nchar]='A'; >> } >> >> printf("%s\n",c); >> return 0; >> } >> nchar used as an index to array 'c' doesn't overflow during loop iterations. >> Thus &c[nchar] acts as a scev. GCC now fails to do that. With this patch, >> this issue is fixed. >> >> Furthermore, the computation of no-overflow information could be improved by >> using TREE_OVERFLOW_UNDEFINED semantic of signed type for C/C++. I didn't >> do that because: >> 1) I doubt how useful it could be because I have already changed scev to use >> the semantic whenever possible. It doesn't need loop niter analysis' help. >> 2) To do that, I need to expose chrec_convert_aggressive information out of >> scev in function simple_iv, because that function could corrupt >> TREE_OVERFLOW_UNDEFINED semantic assumption. This isn't appropriate for >> Stage3. >> >> Bootstrap and test on x86_64 and x86. I don't expect any issue on aarch64 >> either. Is it OK? > > + if (integer_onep (e) > + && (integer_onep (s) > + || (TREE_CODE (c) == INTEGER_CST > + && TREE_CODE (s) == INTEGER_CST > + && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0))) > > the only thing I'm looking at here is the modulo sign. Considering > we're looking at the sign bit of the step to normalize 'c' and 's' what > happens for > > for (unsigned int i = 0; i != 1000; --i) > > ? I suppose we get s == 1 and c == -1000U and you'll say the control > IV doesn't wrap. Similar for i -= 2 where even when we use a signed > modulo (singed)-1000U % 2 is still 0. > > So I think you need to remember whether we consider the step > to be negative and compare iv->base and final as well. I think the patch does the monotonic check wrto sign of step with below code:
+ if (tree_int_cst_sign_bit (iv->step)) + e = fold_build2 (GE_EXPR, boolean_type_node, iv->base, final); + else + e = fold_build2 (LE_EXPR, boolean_type_node, iv->base, final); + e = simplify_using_initial_conditions (loop, e); + if (integer_onep (e) It acts as expected with your example. > > Bonus points for a wrong-code testcase with the above. > > I'd also like to see a testcase exercising step != 1. I added two new tests each for "step != 1" and the previous case. I also tuned original pr68529-3.c a little. Actually for the case in the original patch as below: +void bar(char *s); +int foo(unsigned short l) +{ + char c[10000] = {}; + unsigned short nchar = 9999; + + if (nchar < l) + return -1; + + while(nchar-- != l) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} The offset IS an affine. GCC can't detect that because condition "nchar (==9999) < l" is split into two conditions: "l_8 > 9999" and "l_8 != 9999". For now simplify_using_initial_conditions can't merge range information from two different conditions. Maybe jump threading can merge the two condition/jumps, or VRP improvement discussed before can handle that. Here is the updated patch. Is it OK? Thanks, bin
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c new file mode 100644 index 0000000..eef7460 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ + +void bar(char *s); +int foo() +{ + char c[10000] = {}; + unsigned short nchar = 9999; + + while(nchar-- != 0) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} + +/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library calls" "ldist" } } */ +/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c new file mode 100644 index 0000000..a1d2742 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ + +void bar(char *s); +int foo(unsigned short l) +{ + char c[10000] = {}; + unsigned short nchar = 9999; + + if (nchar <= l) + return -1; + + while(nchar-- != l) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} + +/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library calls" "ldist" } } */ +/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c new file mode 100644 index 0000000..ac68dd7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ + +void bar(char *s); +int foo1(unsigned short l) +{ + char c[10000] = {}; + unsigned short nchar = 9999; + + while(nchar-- != l) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} + +int foo2() +{ + char c[100000] = {}; + unsigned short nchar; + + for (nchar = 0; nchar != 1000; --nchar) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} + +int foo3() +{ + char c[100000] = {}; + unsigned short nchar; + + for (nchar = 0; nchar != 1000; nchar += 3) + { + c[nchar] = 'A'; + } + + bar (c); + return 0; +} + +/* { dg-final { scan-tree-dump-times "failed: evolution of offset is not affine" 3 "ldist" } } */ diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index 6d480c0..67080a3 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -957,13 +957,14 @@ number_of_iterations_ne_max (mpz_t bnd, bool no_overflow, tree c, tree s, bounds on the difference FINAL - IV->base. */ static bool -number_of_iterations_ne (tree type, affine_iv *iv, tree final, - struct tree_niter_desc *niter, bool exit_must_be_taken, - bounds *bnds) +number_of_iterations_ne (struct loop *loop, tree type, affine_iv *iv, + tree final, struct tree_niter_desc *niter, + bool exit_must_be_taken, bounds *bnds) { tree niter_type = unsigned_type_for (type); tree s, c, d, bits, assumption, tmp, bound; mpz_t max; + tree e; niter->control = *iv; niter->bound = final; @@ -998,6 +999,23 @@ number_of_iterations_ne (tree type, affine_iv *iv, tree final, TYPE_SIGN (niter_type)); mpz_clear (max); + /* Compute no-overflow information for the control iv. Note we are + handling NE_EXPR, if iv base equals to final value, the loop exits + immediately, and the iv does not overflow. */ + if (tree_int_cst_sign_bit (iv->step)) + e = fold_build2 (GE_EXPR, boolean_type_node, iv->base, final); + else + e = fold_build2 (LE_EXPR, boolean_type_node, iv->base, final); + e = simplify_using_initial_conditions (loop, e); + if (integer_onep (e) + && (integer_onep (s) + || (TREE_CODE (c) == INTEGER_CST + && TREE_CODE (s) == INTEGER_CST + && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0))) + { + niter->control.no_overflow = true; + } + /* First the trivial cases -- when the step is 1. */ if (integer_onep (s)) { @@ -1361,8 +1379,8 @@ assert_loop_rolls_lt (tree type, affine_iv *iv0, affine_iv *iv1, that the exit must be taken eventually. */ static bool -number_of_iterations_lt (tree type, affine_iv *iv0, affine_iv *iv1, - struct tree_niter_desc *niter, +number_of_iterations_lt (struct loop *loop, tree type, affine_iv *iv0, + affine_iv *iv1, struct tree_niter_desc *niter, bool exit_must_be_taken, bounds *bnds) { tree niter_type = unsigned_type_for (type); @@ -1434,7 +1452,8 @@ number_of_iterations_lt (tree type, affine_iv *iv0, affine_iv *iv1, zps does not overflow. */ zps.no_overflow = true; - return number_of_iterations_ne (type, &zps, delta, niter, true, bnds); + return number_of_iterations_ne (loop, type, &zps, + delta, niter, true, bnds); } /* Make sure that the control iv does not overflow. */ @@ -1473,9 +1492,9 @@ number_of_iterations_lt (tree type, affine_iv *iv0, affine_iv *iv1, is the case). BNDS bounds the difference IV1->base - IV0->base. */ static bool -number_of_iterations_le (tree type, affine_iv *iv0, affine_iv *iv1, - struct tree_niter_desc *niter, bool exit_must_be_taken, - bounds *bnds) +number_of_iterations_le (struct loop *loop, tree type, affine_iv *iv0, + affine_iv *iv1, struct tree_niter_desc *niter, + bool exit_must_be_taken, bounds *bnds) { tree assumption; tree type1 = type; @@ -1523,7 +1542,7 @@ number_of_iterations_le (tree type, affine_iv *iv0, affine_iv *iv1, bounds_add (bnds, 1, type1); - return number_of_iterations_lt (type, iv0, iv1, niter, exit_must_be_taken, + return number_of_iterations_lt (loop, type, iv0, iv1, niter, exit_must_be_taken, bnds); } @@ -1698,18 +1717,18 @@ number_of_iterations_cond (struct loop *loop, { case NE_EXPR: gcc_assert (integer_zerop (iv1->step)); - ret = number_of_iterations_ne (type, iv0, iv1->base, niter, + ret = number_of_iterations_ne (loop, type, iv0, iv1->base, niter, exit_must_be_taken, &bnds); break; case LT_EXPR: - ret = number_of_iterations_lt (type, iv0, iv1, niter, exit_must_be_taken, - &bnds); + ret = number_of_iterations_lt (loop, type, iv0, iv1, niter, + exit_must_be_taken, &bnds); break; case LE_EXPR: - ret = number_of_iterations_le (type, iv0, iv1, niter, exit_must_be_taken, - &bnds); + ret = number_of_iterations_le (loop, type, iv0, iv1, niter, + exit_must_be_taken, &bnds); break; default: