Re: [PATCH PR68529]Fix not recognized scev by computing no-overflow info for loop with NE_EXPR exit condition

Bin.Cheng Fri, 27 Nov 2015 21:51:07 -0800

On Fri, Nov 27, 2015 at 8:51 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Nov 27, 2015 at 12:44 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> This patch is to fix PR68529.  In my previous scev/niter overflow patches, I
>> only computed no-overflow information for control iv in simple loops with
>> LT_EXPR as exit condition code.  This bug is about loop with NE_EXPR as exit
>> condition code.  Given below example:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main(){
>>     char c[10000]={};
>>     unsigned int nchar=9999;
>>
>>     while(nchar--!=0){
>>        c[nchar]='A';
>>       }
>>
>>     printf("%s\n",c);
>>     return 0;
>> }
>> nchar used as an index to array 'c' doesn't overflow during loop iterations.
>> Thus &c[nchar] acts as a scev.  GCC now fails to do that.  With this patch,
>> this issue is fixed.
>>
>> Furthermore, the computation of no-overflow information could be improved by
>> using TREE_OVERFLOW_UNDEFINED semantic of signed type for C/C++.  I didn't
>> do that because:
>> 1) I doubt how useful it could be because I have already changed scev to use
>> the semantic whenever possible.  It doesn't need loop niter analysis' help.
>> 2) To do that, I need to expose chrec_convert_aggressive information out of
>> scev in function simple_iv, because that function could corrupt
>> TREE_OVERFLOW_UNDEFINED semantic assumption.  This isn't appropriate for
>> Stage3.
>>
>> Bootstrap and test on x86_64 and x86.  I don't expect any issue on aarch64
>> either.  Is it OK?
>
> +  if (integer_onep (e)
> +      && (integer_onep (s)
> +         || (TREE_CODE (c) == INTEGER_CST
> +             && TREE_CODE (s) == INTEGER_CST
> +             && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0)))
>
> the only thing I'm looking at here is the modulo sign.  Considering
> we're looking at the sign bit of the step to normalize 'c' and 's' what
> happens for
>
>   for (unsigned int i = 0; i != 1000; --i)
>
> ?  I suppose we get s == 1 and c == -1000U and you'll say the control
> IV doesn't wrap.  Similar for i -= 2 where even when we use a signed
> modulo (singed)-1000U % 2 is still 0.
>
> So I think you need to remember whether we consider the step
> to be negative and compare iv->base and final as well.
I think the patch does the monotonic check wrto sign of step with below code:


+  if (tree_int_cst_sign_bit (iv->step))
+    e = fold_build2 (GE_EXPR, boolean_type_node, iv->base, final);
+  else
+    e = fold_build2 (LE_EXPR, boolean_type_node, iv->base, final);
+  e = simplify_using_initial_conditions (loop, e);
+  if (integer_onep (e)

It acts as expected with your example.

>
> Bonus points for a wrong-code testcase with the above.
>
> I'd also like to see a testcase exercising step != 1.
I added two new tests each for "step != 1" and the previous case.  I
also tuned original pr68529-3.c a little.  Actually for the case in
the original patch as below:
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[10000] = {};
+  unsigned short nchar = 9999;
+
+  if (nchar < l)
+    return -1;
+
+  while(nchar-- != l)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}

The offset IS an affine.  GCC can't detect that because condition
"nchar (==9999) < l" is split into two conditions: "l_8 > 9999" and
"l_8 != 9999".  For now simplify_using_initial_conditions can't merge
range information from two different conditions.  Maybe jump threading
can merge the two condition/jumps, or VRP improvement discussed before
can handle that.

Here is the updated patch.  Is it OK?

Thanks,
bin

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c
new file mode 100644
index 0000000..eef7460
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo()
+{
+  char c[10000] = {};
+  unsigned short nchar = 9999;
+
+  while(nchar-- != 0)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c
new file mode 100644
index 0000000..a1d2742
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-2.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo(unsigned short l)
+{
+  char c[10000] = {};
+  unsigned short nchar = 9999;
+
+  if (nchar <= l)
+    return -1;
+
+  while(nchar-- != l)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "distributed: split to 0 loops and 1 library 
calls" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset" "ldist" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c
new file mode 100644
index 0000000..ac68dd7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68529-3.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+
+void bar(char *s);
+int foo1(unsigned short l)
+{
+  char c[10000] = {};
+  unsigned short nchar = 9999;
+
+  while(nchar-- != l)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}
+
+int foo2()
+{
+  char c[100000] = {};
+  unsigned short nchar;
+
+  for (nchar = 0; nchar != 1000; --nchar)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}
+
+int foo3()
+{
+  char c[100000] = {};
+  unsigned short nchar;
+
+  for (nchar = 0; nchar != 1000; nchar += 3)
+    {
+      c[nchar] = 'A';
+    }
+
+  bar (c);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "failed: evolution of offset is not 
affine" 3 "ldist" } } */
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 6d480c0..67080a3 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -957,13 +957,14 @@ number_of_iterations_ne_max (mpz_t bnd, bool no_overflow, 
tree c, tree s,
    bounds on the difference FINAL - IV->base.  */
 
 static bool
-number_of_iterations_ne (tree type, affine_iv *iv, tree final,
-                        struct tree_niter_desc *niter, bool exit_must_be_taken,
-                        bounds *bnds)
+number_of_iterations_ne (struct loop *loop, tree type, affine_iv *iv,
+                        tree final, struct tree_niter_desc *niter,
+                        bool exit_must_be_taken, bounds *bnds)
 {
   tree niter_type = unsigned_type_for (type);
   tree s, c, d, bits, assumption, tmp, bound;
   mpz_t max;
+  tree e;
 
   niter->control = *iv;
   niter->bound = final;
@@ -998,6 +999,23 @@ number_of_iterations_ne (tree type, affine_iv *iv, tree 
final,
                                 TYPE_SIGN (niter_type));
   mpz_clear (max);
 
+  /* Compute no-overflow information for the control iv.  Note we are
+     handling NE_EXPR, if iv base equals to final value, the loop exits
+     immediately, and the iv does not overflow.  */
+  if (tree_int_cst_sign_bit (iv->step))
+    e = fold_build2 (GE_EXPR, boolean_type_node, iv->base, final);
+  else
+    e = fold_build2 (LE_EXPR, boolean_type_node, iv->base, final);
+  e = simplify_using_initial_conditions (loop, e);
+  if (integer_onep (e)
+      && (integer_onep (s)
+         || (TREE_CODE (c) == INTEGER_CST
+             && TREE_CODE (s) == INTEGER_CST
+             && wi::mod_trunc (c, s, TYPE_SIGN (type)) == 0)))
+    {
+      niter->control.no_overflow = true;
+    }
+
   /* First the trivial cases -- when the step is 1.  */
   if (integer_onep (s))
     {
@@ -1361,8 +1379,8 @@ assert_loop_rolls_lt (tree type, affine_iv *iv0, 
affine_iv *iv1,
    that the exit must be taken eventually.  */
 
 static bool
-number_of_iterations_lt (tree type, affine_iv *iv0, affine_iv *iv1,
-                        struct tree_niter_desc *niter,
+number_of_iterations_lt (struct loop *loop, tree type, affine_iv *iv0,
+                        affine_iv *iv1, struct tree_niter_desc *niter,
                         bool exit_must_be_taken, bounds *bnds)
 {
   tree niter_type = unsigned_type_for (type);
@@ -1434,7 +1452,8 @@ number_of_iterations_lt (tree type, affine_iv *iv0, 
affine_iv *iv1,
         zps does not overflow.  */
       zps.no_overflow = true;
 
-      return number_of_iterations_ne (type, &zps, delta, niter, true, bnds);
+      return number_of_iterations_ne (loop, type, &zps,
+                                     delta, niter, true, bnds);
     }
 
   /* Make sure that the control iv does not overflow.  */
@@ -1473,9 +1492,9 @@ number_of_iterations_lt (tree type, affine_iv *iv0, 
affine_iv *iv1,
    is the case).  BNDS bounds the difference IV1->base - IV0->base.  */
 
 static bool
-number_of_iterations_le (tree type, affine_iv *iv0, affine_iv *iv1,
-                        struct tree_niter_desc *niter, bool exit_must_be_taken,
-                        bounds *bnds)
+number_of_iterations_le (struct loop *loop, tree type, affine_iv *iv0,
+                        affine_iv *iv1, struct tree_niter_desc *niter,
+                        bool exit_must_be_taken, bounds *bnds)
 {
   tree assumption;
   tree type1 = type;
@@ -1523,7 +1542,7 @@ number_of_iterations_le (tree type, affine_iv *iv0, 
affine_iv *iv1,
 
   bounds_add (bnds, 1, type1);
 
-  return number_of_iterations_lt (type, iv0, iv1, niter, exit_must_be_taken,
+  return number_of_iterations_lt (loop, type, iv0, iv1, niter, 
exit_must_be_taken,
                                  bnds);
 }
 
@@ -1698,18 +1717,18 @@ number_of_iterations_cond (struct loop *loop,
     {
     case NE_EXPR:
       gcc_assert (integer_zerop (iv1->step));
-      ret = number_of_iterations_ne (type, iv0, iv1->base, niter,
+      ret = number_of_iterations_ne (loop, type, iv0, iv1->base, niter,
                                     exit_must_be_taken, &bnds);
       break;
 
     case LT_EXPR:
-      ret = number_of_iterations_lt (type, iv0, iv1, niter, exit_must_be_taken,
-                                    &bnds);
+      ret = number_of_iterations_lt (loop, type, iv0, iv1, niter,
+                                    exit_must_be_taken, &bnds);
       break;
 
     case LE_EXPR:
-      ret = number_of_iterations_le (type, iv0, iv1, niter, exit_must_be_taken,
-                                    &bnds);
+      ret = number_of_iterations_le (loop, type, iv0, iv1, niter,
+                                    exit_must_be_taken, &bnds);
       break;
 
     default:

Re: [PATCH PR68529]Fix not recognized scev by computing no-overflow info for loop with NE_EXPR exit condition

Reply via email to