After shrink-wrapping has found the "tightest fit" for where to place
the prologue, it tries move it earlier (so that frame saves are run
earlier) -- but without copying any more basic blocks.

Unfortunately a candidate block we select can be inside a loop, and we
will still allow it (because the loop always exits via our previously
chosen block).  We can do that just fine if we make a duplicate of the
block, but we do not want to here.

So we need to detect this situation.  We can place the prologue at a
previous block PRE only if PRE dominates every block reachable from
it, because then we will never need to duplicate that block (it will
always be executed with prologue).

v4: Fixed all the stupid mistakes you noticed.  Also, the previous
version stopped looking when the previous try didn't work out.  This
version doesn't: it is simpler, more in line with the rest of the
algorithm, potentially useful, and doesn't really cost more.

Tested on the two testcases from the PRs.  Also regression checked
on powerpc64-linux.

Is this okay for trunk?


Segher


2015-12-09  Segher Boessenkool  <seg...@kernel.crashing.org>

        PR rtl-optimization/67778
        PR rtl-optimization/68634
        * shrink-wrap.c (try_shrink_wrapping): Add a comment about why we want
        to put the prologue earlier.  When determining if an earlier block is
        suitable, make sure it dominates every block reachable from it.

---
 gcc/shrink-wrap.c | 42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)


---
 gcc/shrink-wrap.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c
index 3a1df84..f65b0c3 100644
--- a/gcc/shrink-wrap.c
+++ b/gcc/shrink-wrap.c
@@ -744,36 +744,66 @@ try_shrink_wrapping (edge *entry_edge, bitmap_head 
*bb_with,
          vec.quick_push (e->dest);
     }
 
-  vec.release ();
-
   if (dump_file)
     fprintf (dump_file, "Avoiding non-duplicatable blocks, PRO is now %d\n",
             pro->index);
 
   /* If we can move PRO back without having to duplicate more blocks, do so.
+     We do this because putting the prologue earlier is better for scheduling.
      We can move back to a block PRE if every path from PRE will eventually
-     need a prologue, that is, PRO is a post-dominator of PRE.  */
+     need a prologue, that is, PRO is a post-dominator of PRE.  PRE needs
+     to dominate every block reachable from itself.  */
 
   if (pro != entry)
     {
       calculate_dominance_info (CDI_POST_DOMINATORS);
 
+      bitmap bb_tmp = BITMAP_ALLOC (NULL);
+      bitmap_copy (bb_tmp, bb_with);
       basic_block last_ok = pro;
+      vec.truncate (0);
+
       while (pro != entry)
        {
          basic_block pre = get_immediate_dominator (CDI_DOMINATORS, pro);
          if (!dominated_by_p (CDI_POST_DOMINATORS, pre, pro))
            break;
 
+         if (bitmap_set_bit (bb_tmp, pre->index))
+           vec.quick_push (pre);
+
+         bool ok = true;
+         while (!vec.is_empty ())
+           {
+             basic_block bb = vec.pop ();
+             bitmap_set_bit (bb_tmp, pre->index);
+
+             if (!dominated_by_p (CDI_DOMINATORS, bb, pre))
+               {
+                 ok = false;
+                 break;
+               }
+
+             FOR_EACH_EDGE (e, ei, bb->succs)
+               if (!bitmap_bit_p (bb_with, e->dest->index)
+                   && bitmap_set_bit (bb_tmp, e->dest->index))
+                 vec.quick_push (e->dest);
+           }
+
+         if (ok && can_get_prologue (pre, prologue_clobbered))
+           last_ok = pre;
+
          pro = pre;
-         if (can_get_prologue (pro, prologue_clobbered))
-           last_ok = pro;
        }
+
       pro = last_ok;
 
+      BITMAP_FREE (bb_tmp);
       free_dominance_info (CDI_POST_DOMINATORS);
     }
 
+  vec.release ();
+
   if (dump_file)
     fprintf (dump_file, "Bumping back to anticipatable blocks, PRO is now 
%d\n",
             pro->index);
-- 
1.9.3

Reply via email to