Re: [PATCH V3] Split loop for NE condition.

guojiufu via Gcc-patches Mon, 21 Jun 2021 20:55:23 -0700

On 2021-06-21 16:51, Richard Biener wrote:

On Wed, 9 Jun 2021, guojiufu wrote:

On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:
> On 2021-06-08 18:13, Richard Biener wrote:
>> On Fri, 4 Jun 2021, Jiufu Guo wrote:
>>
> cut...
>>> +      gcond *cond = as_a<gcond *> (last);
>>> +      enum tree_code code = gimple_cond_code (cond);
>>> +      if (!(code == NE_EXPR
>>> +      || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE))))
>>
>> The NE_EXPR check misses a corresponding && (e->flags & EDGE_FALSE_VALUE)
>> check.
>>
> Thanks, check (e->flags & EDGE_FALSE_VALUE) would be safer.
>
>>> +  continue;
>>> +
>>> +      /* Check if bound is invarant.  */
>>> +      tree idx = gimple_cond_lhs (cond);
>>> +      tree bnd = gimple_cond_rhs (cond);
>>> +      if (expr_invariant_in_loop_p (loop, idx))
>>> +  std::swap (idx, bnd);
>>> +      else if (!expr_invariant_in_loop_p (loop, bnd))
>>> +  continue;
>>> +
>>> +      /* Only unsigned type conversion could cause wrap.  */
>>> +      tree type = TREE_TYPE (idx);
>>> +      if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
>>> +    || !TYPE_UNSIGNED (type))
>>> +  continue;
>>> +
>>> +      /* Avoid to split if bound is MAX/MIN val.  */
>>> +      tree bound_type = TREE_TYPE (bnd);
>>> +      if (TREE_CODE (bnd) == INTEGER_CST && INTEGRAL_TYPE_P (bound_type)
>>> +    && (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
>>> +        || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type))))
>>> +  continue;
>>
>> Note you do not require 'bnd' to be constant and thus at runtime those
>> cases still need to be handled correctly.
> Yes, bnd is not required to be constant.  The above code is filtering the
> case
> where bnd is const max/min value of the type.  So, the code could be updated
> as:
>       if (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
>           || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))


Yes, and the comment adjusted to "if bound is known to be MAX/MIN val."

>>
>>> +      /* Check if there is possible wrap.  */
>>> +      class tree_niter_desc niter;
>>> +      if (!number_of_iterations_exit (loop, e, &niter, false, false))
> cut...
>>> +
>>> +  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */
>>
>> It now occurs to me that we nowhere check the evolution of IDX
>> (split_at_bb_p uses simple_iv for this for example).  The transform
>> assumes that we will actually hit i == n and that i increments, but
>> while you check the control IV from number_of_iterations_exit
>> for NE_EXPR that does not guarantee a positive evolution.
>>
> If I do not correctly reply your question, please point out:
> number_of_iterations_exit is similar with simple_iv to invoke
> simple_iv_with_niters
> which check the evolution, and number_of_iterations_exit check
> number_of_iterations_cond
> which check no_overflow more accurate, this is one reason I use this
> function.
>
> This transform assumes that the last run hits i==n.
> Otherwise, the loop may run infinitely wrap after wrap.
> For safe, if the step is 1 or -1,  this assumption would be true.  I
> would add this check.

OK.

> Thanks so much for pointing out I missed the negative step!
>
>> Your testcases do not include any negative step examples, but I guess
>> the conditions need to be swapped in this case?
>
> I would add cases and code to support step 1/-1.
>
>>
>> I think you also have to consider the order we split, say with
>>
>>   for (i = start; i != end; ++i)
>>     {
>>       push (i);
>>       if (a[i] != b[i])
>>         break;
>>     }
>>
>> push (i) calls need to be in the same order for all cases of
>> start < end, start == end and start > end (and also cover
>> runtime testcases with end == 0 or end == UINT_MAX, likewise
>> for start).
> I add tests for the above cases. If missing sth, please point out, thanks!
>
>>
>>> +  bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc));
>>> +  enum tree_code up_code = inv ? LT_EXPR : GT_EXPR;
>>> +  enum tree_code down_code = inv ? GT_EXPR : LT_EXPR;
> cut....
>
> Thanks again for the very helpful review!
>
> BR,
> Jiufu Guo.

Here is the updated patch, thanks for your time!

diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
b/gcc/testsuite/gcc.dg/loop-split1.c
new file mode 100644
index 00000000000..dd2d03a7b96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+    a[l] = b[l] + 1;
+}
+void
+foo_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (++l != n)
+    a[l] = b[l] + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+    a[l] = b[l] + 1;
+}
+
+/* No wrap.  */
+void
+foo1_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (l++ != n)
+    a[l] = b[l] + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+unsigned
+foo2_1 (char *a, char *b, unsigned l, unsigned n)
+{
+  l = 0;
+  while (++l != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+unsigned
+foo3 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+/* No wrap.  */
+unsigned
+foo3_1 (char *a, char *b, unsigned l, unsigned n)
+{
+  l = 0;
+  while (l++ != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+void
+bar ();
+void
+foo4 (unsigned n, unsigned i)
+{
+  do
+    {
+      if (i == n)
+       return;
+      bar ();
+      ++i;
+    }
+  while (1);
+}
+
+unsigned
+find_skip_diff (char *p, char *q, unsigned n, unsigned i)
+{
+  while (p[i] == q[i] && ++i != n)
+    p++, q++;
+
+  return i;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop split" 8 "lsplit" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-split2.c
b/gcc/testsuite/gcc.dg/loop-split2.c
new file mode 100644
index 00000000000..56377e2f2f5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split2.c
@@ -0,0 +1,155 @@
+/* { dg-do run } */
+/* { dg-options "-O3" } */
+
+extern void
+abort (void);
+extern void
+exit (int);
+void
+push (int);
+
+#define NI __attribute__ ((noinline))
+
+void NI
+foo (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (++l != n)
+    a[l] = b[l] + 1;
+}
+
+unsigned NI
+bar (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (l++ != n)
+    {
+      push (l);
+      if (a[l] != b[l])
+       break;
+      push (l + 1);
+    }
+  return l;
+}
+
+void NI
+foo_1 (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (--l != n)
+    a[l] = b[l] + 1;
+}
+
+unsigned NI
+bar_1 (int *a, int *b, unsigned char l, unsigned char n)
+{
+  while (l-- != n)
+    {
+      push (l);
+      if (a[l] != b[l])
+       break;
+      push (l + 1);
+    }
+
+  return l;
+}
+
+int a[258];
+int b[258];
+int c[1024];
+static int top = 0;
+void
+push (int e)
+{
+  c[top++] = e;
+}
+
+void
+reset ()
+{
+  top = 0;
+  __builtin_memset (c, 0, sizeof (c));
+}
+
+#define check(a, b) (a == b)
+
+int
+check_c (int *c, int a0, int a1, int a2, int a3, int a4, int a5)
+{
+  return check (c[0], a0) && check (c[1], a1) && check (c[2], a2)
+        && check (c[3], a3) && check (c[4], a4) && check (c[5], a5);
+}
+
+int
+main ()
+{
+  __builtin_memcpy (b, a, sizeof (a));
+  reset ();
+  if (bar (a, b, 6, 8) != 9 || !check_c (c, 7, 8, 8, 9, 0, 0))
+    abort ();
+
+  reset ();
+  if (bar (a, b, 5, 3) != 4 || !check_c (c, 6, 7, 7, 8, 8, 9)
+      || !check_c (c + 496, 254, 255, 255, 256, 0, 1))
+    abort ();
+
+  reset ();
+  if (bar (a, b, 6, 6) != 7 || !check_c (c, 0, 0, 0, 0, 0, 0))
+    abort ();
+
+  reset ();

+ if (bar (a, b, 253, 255) != 0 || !check_c (c, 254, 255, 255, 256,0, 0))

+    abort ();
+
+  reset ();

+ if (bar (a, b, 253, 0) != 1 || !check_c (c, 254, 255, 255, 256, 0,1))

+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 6, 8) != 7 || !check_c (c, 5, 6, 4, 5, 3, 4))
+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 5, 3) != 2 || !check_c (c, 4, 5, 3, 4, 0, 0))
+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 6, 6) != 5)
+    abort ();
+
+  reset ();

+ if (bar_1 (a, b, 2, 255) != 254 || !check_c (c, 1, 2, 0, 1, 255,256))

+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 2, 0) != 255 || !check_c (c, 1, 2, 0, 1, 0, 0))
+    abort ();
+
+  b[100] += 1;
+  reset ();
+  if (bar (a, b, 90, 110) != 100)
+    abort ();
+
+  reset ();
+  if (bar (a, b, 110, 105) != 100)
+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 90, 110) != 109)
+    abort ();
+
+  reset ();
+  if (bar_1 (a, b, 2, 90) != 100)
+    abort ();
+
+  foo (a, b, 99, 99);
+  a[99] = b[99] + 1;
+  for (int i = 0; i < 256; i++)
+    if (a[i] != b[i] + 1)
+      abort ();
+
+  foo_1 (a, b, 99, 99);
+  a[99] = b[99] + 1;
+  for (int i = 0; i < 256; i++)
+    if (a[i] != b[i] + 1)
+      abort ();
+
+  exit (0);
+}
diff --git a/gcc/testsuite/gcc.dg/loop-split3.c
b/gcc/testsuite/gcc.dg/loop-split3.c
new file mode 100644
index 00000000000..ec93ee8bd12
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split3.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (--l != n)
+    a[l] = b[l] + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l-- != n)
+    a[l] = b[l] + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (--l != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+unsigned
+foo3 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (l-- != n)
+    if (a[l] != b[l])
+      break;
+
+  return l;
+}
+
+void
+bar ();
+void
+foo4 (unsigned n, unsigned i)
+{
+  do
+    {
+      if (i == n)
+       return;
+      bar ();
+      --i;
+    }
+  while (1);
+}
+
+unsigned
+find_skip_diff (char *p, char *q, unsigned n, unsigned i)
+{
+  while (p[i] == q[i] && --i != n)
+    p--, q--;
+
+  return i;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop split" 6 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index 3a09bbc39e5..e9f23b32186 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "gimple-fold.h"
 #include "gimplify-me.h"
+#include "tree-ssa-loop-ivopts.h"

 /* This file implements two kinds of loop splitting.

@@ -229,11 +230,14 @@ easy_exit_values (class loop *loop)
    conditional).  I.e. the second loop can now be entered either
    via the original entry or via NEW_E, so the entry values of LOOP2
    phi nodes are either the original ones or those at the exit
-   of LOOP1.  Insert new phi nodes in LOOP2 pre-header reflecting
-   this.  The loops need to fulfill easy_exit_values().  */
+   of LOOP1.  Selecting the previous value instead next value as the
+   exit value of LOOP1 if USE_PREV is true.  Insert new phi nodes in
+   LOOP2 pre-header reflecting this.  The loops need to fulfill
+   easy_exit_values().  */

 static void
-connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e)
+connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e,
+                  bool use_prev = false)
 {
   basic_block rest = loop_preheader_edge (loop2)->src;
   gcc_assert (new_e->dest == rest);

@@ -279,7 +283,8 @@ connect_loop_phis (class loop *loop1, class loop*loop2,

edge new_e)

       gphi * newphi = create_phi_node (new_init, rest);
       add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
-      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);

+ add_phi_arg (newphi, use_prev ? PHI_RESULT (phi_first) : next,new_e,

+                  UNKNOWN_LOCATION);
       SET_USE (op, new_init);
     }
 }
@@ -1593,6 +1598,252 @@ split_loop_on_cond (struct loop *loop)
   return do_split;
 }

+/* Check if the LOOP exit branch is like "if (idx != bound)",

+ Return the branch edge which exit loop, if wrap may happen on"idx". */

+
+static edge
+get_ne_cond_branch (struct loop *loop, tree *step)
+{
+  int i;
+  edge e;
+
+  auto_vec<edge> edges = get_loop_exit_edges (loop);
+  FOR_EACH_VEC_ELT (edges, i, e)
+    {
+      basic_block bb = e->src;
+
+      /* Check if exit at gcond.  */
+      gimple *last = last_stmt (bb);
+      if (!last || gimple_code (last) != GIMPLE_COND)
+       continue;
+      gcond *cond = as_a<gcond *> (last);


   gcond *cont = safe_dyn_cast <gcond *> (last_stmt (bb));
   if (!last)
     continue;

is shorter.

Thanks.

+      enum tree_code code = gimple_cond_code (cond);
+      if (!((code == NE_EXPR && (e->flags & EDGE_FALSE_VALUE))
+           || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE))))
+       continue;
+
+      /* Check if bound is invarant.  */
+      tree idx = gimple_cond_lhs (cond);
+      tree bnd = gimple_cond_rhs (cond);
+      if (expr_invariant_in_loop_p (loop, idx))
+       std::swap (idx, bnd);
+      else if (!expr_invariant_in_loop_p (loop, bnd))
+       continue;
+
+      /* Only unsigned type conversion could cause wrap.  */
+      tree type = TREE_TYPE (idx);
+      if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
+         || !TYPE_UNSIGNED (type))
+       continue;
+
+      /* Avoid to split if bound is MAX/MIN val.  */
+      tree bound_type = TREE_TYPE (bnd);
+      if (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
+         || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))
+       continue;
+
+      /* Check if there is possible wrap.  */
+      class tree_niter_desc niter;
+      if (!number_of_iterations_exit (loop, e, &niter, false, false))
+       continue;
+      if (niter.control.no_overflow)
+       return NULL;
+      if (niter.cmp != NE_EXPR)
+       continue;
+      if (!integer_onep (niter.control.step)
+         && !integer_minus_onep (niter.control.step))
+       continue;
+      *step = niter.control.step;
+

+ /* If exit edge is just before the empty latch, it is easy tolink

+        the split loops: just jump from the exit edge of one loop to the
+        header of new loop.  */
+      if (single_pred_p (loop->latch)
+         && single_pred_edge (loop->latch)->src == bb
+         && empty_block_p (loop->latch))
+       return e;
+

+ /* If exit edge is at end of header, and header contains i++ or++i,

+        only, it is simple to link the split loops: jump from the end of

+ one loop header to the new loop header, and use unchanged PHIresult

+        of the first loop as the entry PHI value of the second loop.  */
+      if (bb == loop->header)
+       {
+         /* Only one phi.  */
+         gphi_iterator psi = gsi_start_phis (bb);
+         if (gsi_end_p (psi))
+           continue;
+         gphi *phi = psi.phi ();
+         gsi_next (&psi);
+         if (!gsi_end_p (psi))
+           continue;
+
+         /* Check it is ++i or ++i */
+         tree next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+         tree prev = PHI_RESULT (phi);
+         if (idx != prev && idx != next)
+           continue;
+

+ gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb(bb);

+         if (gsi_end_p (gsi))
+           continue;
+         gimple *s1 = gsi_stmt (gsi);
+         if (!is_gimple_assign (s1) || gimple_assign_lhs (s1) != next
+             || gimple_assign_rhs1 (s1) != prev)
+           continue;
+
+         gsi_next_nondebug (&gsi);
+         if (!gsi_end_p (gsi) && gsi_stmt (gsi) == cond)
+           return e;
+       }
+    }
+
+  return NULL;
+}
+

+/* Split the LOOP with NE_EXPR into two loops with GT_EXPR andLT_EXPR. */

+
+static bool
+split_ne_loop (struct loop *loop, edge cond_e, tree step)
+{
+  initialize_original_copy_tables ();
+
+  struct loop *loop2 = loop_version (loop, boolean_true_node, NULL,
+                                    profile_probability::always (),
+                                    profile_probability::never (),
+                                    profile_probability::always (),
+                                    profile_probability::always (), true);
+
+  gcc_assert (loop2);
+  update_ssa (TODO_update_ssa);
+
+  basic_block loop2_cond_exit_bb = get_bb_copy (cond_e->src);
+  free_original_copy_tables ();
+
+  gcond *gc = as_a<gcond *> (last_stmt (cond_e->src));
+  gcond *dup_gc = as_a<gcond *> (last_stmt (loop2_cond_exit_bb));
+
+  /* Invert edges for gcond.  */
+  if (gimple_cond_code (gc) == EQ_EXPR)
+    {
+      auto invert_edge = [](basic_block bb) {
+       edge out = EDGE_SUCC (bb, 0);
+       edge in = EDGE_SUCC (bb, 1);
+       if (in->flags & EDGE_TRUE_VALUE)
+         std::swap (in, out);
+       in->flags |= EDGE_TRUE_VALUE;
+       in->flags &= ~EDGE_FALSE_VALUE;
+       out->flags |= EDGE_FALSE_VALUE;
+       out->flags &= ~EDGE_TRUE_VALUE;
+      };
+
+      invert_edge (gimple_bb (gc));
+      invert_edge (gimple_bb (dup_gc));
+    }
+
+  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */
+  bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc));
+  if (tree_int_cst_sign_bit (step))
+    inv = !inv;
+  enum tree_code first_loop_code = inv ? LT_EXPR : GT_EXPR;
+  enum tree_code second_loop_code = inv ? GT_EXPR : LT_EXPR;


You could handle gimple_cond_code (gc) == EQ_EXPR via

   if (gimple_cond_code (gc) == EQ_EXPR)
    {
     first_loop_code = invert_tree_comparison (first_loop_code, false);

second_loop_code = invert_tree_comparison (second_loop_code,false);

   }

that looks simpler than the lambda dance with inverting the edge flags.

I did not use invert_tree_comparison in the code, because it wouldgenerate

GE_EXPR/LE_EXPR which look like containing EQ_EXPR :)

I would update patch to use invert_tree_comparison which is simpler andshorter.

+  gimple_cond_set_code (gc, first_loop_code);
+  gimple_cond_set_code (dup_gc, second_loop_code);
+
+  /* Link the exit cond edge to new loop.  */
+  gcond *break_cond = as_a<gcond *> (gimple_copy (gc));
+  edge pred_e = single_pred_edge (loop->latch);
+  bool simple_loop

+ = pred_e && pred_e->src == cond_e->src && empty_block_p(loop->latch);

+  if (simple_loop)
+    gimple_cond_set_code (break_cond, second_loop_code);
+  else
+    gimple_cond_make_true (break_cond);
+
+  basic_block break_bb = split_edge (cond_e);
+  gimple_stmt_iterator gsi = gsi_last_bb (break_bb);
+  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
+
+  edge to_exit = single_succ_edge (break_bb);

+ edge to_new_loop = make_edge (break_bb, loop_preheader_edge(loop2)->src,

0);
+  to_new_loop->flags |= EDGE_TRUE_VALUE;
+  to_exit->flags |= EDGE_FALSE_VALUE;
+  to_exit->flags &= ~EDGE_FALLTHRU;
+  to_exit->probability = cond_e->probability;
+  to_new_loop->probability = to_exit->probability.invert ();
+
+  update_ssa (TODO_update_ssa);


I've re-organized the pass to perform a single TODO_update_ssa at the
very end, please do not update SSA form here, nor

+  connect_loop_phis (loop, loop2, to_new_loop, !simple_loop);
+
+  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop);


re-write into loop-closed SSA.

Thanks!

+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, ";; Loop split on wrap index.\n");
+
+  return true;
+}
+

+/* Checks if LOOP contains a suitable NE_EXPR conditional block tosplit.

+L_H:
+ if (i!=N)
+   S;
+ else
+   goto EXIT;
+ i++;
+ goto L_H;
+
+The "i!=N" is like "i>N || i<N", then it could be transformed to:
+
+L_H:
+ if (i>N)
+   S;
+ else
+   goto EXIT;
+ i++;
+ goto L_H;
+L1_H:
+ if (i<N)
+   S;
+ else
+   goto EXIT;
+ i++;
+ goto L1_H;
+
+The loop with "i<N" is in favor of both GIMPLE and RTL passes.  */
+
+static bool
+split_loop_on_ne_cond (class loop *loop)
+{
+  tree step;
+  edge branch_edge = get_ne_cond_branch (loop, &step);
+  if (!branch_edge)
+    return false;
+
+  int num = 0;
+  basic_block *bbs = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; i++)

+ num += estimate_num_insns_seq (bb_seq (bbs[i]),&eni_size_weights);

Since the motivation is to make data-refs analyzable after thetransform

if there are no datarefs the transform only increases code-size.  In
particular I would look for calls which will be not analyzable.  Since
we're looking at each stmt above that could be embedded here.  As said
earlier once num exceeds param_max_peeled_insns you can stop the above
loop.  So heuristically I'd do sth like

Right, if no following optimization enabled, it only increases code-size
and increase branches overhead, and may not gain performance.


    for (gimple_stmt_iterator gsi = gsi_start_bb (bbs[i]); !gsi_end_p
(gsi); gsi_next (&gsi))
      {
        gimple  *stmt = gsi_stmt (gsi);
        if (is_gimple_debug (stmt))
          continue;
        if (gimple_has_side_effects (stmt))
          {
            free (bbs);
            return false;
          }
        num += estimate_num_insns (stmt, &eni_size_weights);
        if (num > param_max_peeled_insns)
          {
            free (bbs);
            return false;
          }
        if (gimple_vuse (stmt))
          any_dr = true;
      }
    if (!any_dr)
      {
        free (bbs);
        return false;
      }

There's also still the issue that the transformed loop will fail
number of iteration analysis for the loop that iterates until
the IV wraps.  That's a blocker for the acceptance of this transform.

Understand, I saw you opened PR101145.

Thanks for your comments!

BR,
Jiufu Guo.


Richard.

Re: [PATCH V3] Split loop for NE condition.

Reply via email to