> This makes it quadratic in the number of vectorized early exit loops
> in a function.  The vectorizer CFG manipulation operates in a local
> enough bubble that programmatic updating of dominators should be
> possible (after all we manage to produce correct SSA form!), the
> proposed change gets us too far off to a point where re-computating
> dominance info is likely cheaper (but no, we shouldn't do this either).
> 
> Can you instead give manual updating a try again?  I think
> versioning should produce up-to-date dominator info, it's only
> when you redirect branches during peeling that you'd need
> adjustments - but IIRC we're never introducing new merges?
> 
> IIRC we can't wipe dominators during transform since we query them
> during code generation.  We possibly could code generate all
> CFG manipulations of all vectorized loops, recompute all dominators
> and then do code generation of all vectorized loops.
> 
> But then we're doing a loop transform and the exits will ultimatively
> end up in the same place, so the CFG and dominator update is bound to
> where the original exits went to.

Yeah that's a fair point, the issue is specifically with at_exit.  So how about:

When we peel at_exit we are moving the new loop at the exit of the previous
loop.  This means that the blocks outside the loop dat the previous loop used to
dominate are no longer being dominated by it.

The new dominators however are hard to predict since if the loop has multiple
exits and all the exits are an "early" one then we always execute the scalar
loop.  In this case the scalar loop can completely dominate the new loop.

If we later have skip_vector then there's an additional skip edge added that
might change the dominators.

The previous patch would force an update of all blocks reachable from the new
exits.  This one updates *only* blocks that we know the scalar exits dominated.

For the examples this reduces the blocks to update from 18 to 3.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues normally and with --enable-checking=release --enable-lto
--with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        PR tree-optimization/113144
        PR tree-optimization/113145
        * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
        Update all BB that the original exits dominated.

gcc/testsuite/ChangeLog:

        PR tree-optimization/113144
        PR tree-optimization/113145
        * gcc.dg/vect/vect-early-break_94-pr113144.c: New test.

--- inline copy of patch ---

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
new file mode 100644
index 
0000000000000000000000000000000000000000..903fe7be6621e81db6f29441e4309fa213d027c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+long tar_atol256_max, tar_atol256_size, tar_atosl_min;
+char tar_atol256_s;
+void __errno_location();
+
+
+inline static long tar_atol256(long min) {
+  char c;
+  int sign;
+  c = tar_atol256_s;
+  sign = c;
+  while (tar_atol256_size) {
+    if (c != sign)
+      return sign ? min : tar_atol256_max;
+    c = tar_atol256_size--;
+  }
+  if ((c & 128) != (sign & 128))
+    return sign ? min : tar_atol256_max;
+  return 0;
+}
+
+inline static long tar_atol(long min) {
+  return tar_atol256(min);
+}
+
+long tar_atosl() {
+  long n = tar_atol(-1);
+  if (tar_atosl_min) {
+    __errno_location();
+    return 0;
+  }
+  if (n > 0)
+    return 0;
+  return n;
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 
76d4979c0b3b374dcaacf6825a95a8714114a63b..9bacaa182a3919cae1cb99dfc5ae4923e1f93376
 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1719,8 +1719,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
          /* Now link the alternative exits.  */
          if (multiple_exits_p)
            {
-             set_immediate_dominator (CDI_DOMINATORS, new_preheader,
-                                      main_loop_exit_block);
              for (auto gsi_from = gsi_start_phis (loop->header),
                   gsi_to = gsi_start_phis (new_preheader);
                   !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to);
@@ -1776,7 +1774,14 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
        {
          update_loop = new_loop;
          for (edge e : get_loop_exit_edges (loop))
-           doms.safe_push (e->dest);
+           {
+             /* Basic blocks that the old loop dominated are now dominated by
+                the new loop and so we have to update those.  */
+             for (auto bb : get_all_dominated_blocks (CDI_DOMINATORS, e->src))
+               if (!flow_bb_inside_loop_p (loop, bb))
+                 doms.safe_push (bb);
+             doms.safe_push (e->dest);
+           }
          doms.safe_push (exit_dest);
 
          /* Likely a fall-through edge, so update if needed.  */

Attachment: rb18113 (1).patch
Description: rb18113 (1).patch

Reply via email to