I noticed that when there are registers to save (that can vary with
ABI), shrink-wrapping would
arrange for a more expeditious early return than when there were no
registers to save,
but still some dull argument copies to make for the main function,
even if they are not
needed for the early return path.  Most of the logic to do
shrink-wrapping also in the absence
of register saves is already there, and the generated code indeed
looks better when this
is thus used.  However, I couldn't find a difference in the execution
time of the benchmarks
I was looking at, presumably because the function didn't actually
return early (doing
things with an array of N elements where N might be zero... but it
isn't for the actual data).

Does someone have a benchmark / computing load where the early return
is beneficial?  Or conversely, harmful?
2022-03-14  Joern Rennecke  <joern.renne...@embecosm.com>

        * common.opt (fearly-return): New option.
        * shrink-wrap.cc (try_early_return): New function.
        (try_shrink_wrapping): Call try_early_return.

diff --git a/gcc/common.opt b/gcc/common.opt
index 8b6513de47c..901287fcad6 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3607,4 +3607,8 @@ fipa-ra
 Common Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 
+fearly-return
+Common Var(flag_early_return) Optimization Init(1)
+Extend shrink-wrapping to prologue-free functions.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/shrink-wrap.cc b/gcc/shrink-wrap.cc
index 30166bd20eb..31ab0ecff10 100644
--- a/gcc/shrink-wrap.cc
+++ b/gcc/shrink-wrap.cc
@@ -586,6 +586,42 @@ handle_simple_exit (edge e)
             INSN_UID (ret), e->src->index);
 }
 
+/* Even if there is no prologue, we might have a number of argument
+   copy and initialization statements in the first basic block that
+   might be unnecessary if we return early.  */
+/* ??? This might be overly agressive for super-scalar processors without
+   speculative execution in that we migth want to keep enough instructions
+   in front of the branch to fill all issue slots.
+
+   If the branch depends on a register copied from another register
+   immediately before, later passes already take care of propagating the
+   copy into the branch.  */
+void
+try_early_return (edge *entry_edge)
+{
+  basic_block entry = (*entry_edge)->dest;
+  if (EDGE_COUNT (entry->succs) != 2 || !single_pred_p (entry))
+    return;
+  edge e;
+  edge_iterator ei;
+  const int max_depth = 20;
+
+  FOR_EACH_EDGE (e, ei, entry->succs)
+    {
+      basic_block dst = e->dest;
+      for (int i = max_depth; --i; dst = single_succ (dst))
+       {
+         if (dst == EXIT_BLOCK_PTR_FOR_FN (cfun))
+           {
+             prepare_shrink_wrap (entry);
+             return;
+           }
+         if (!single_succ_p (dst))
+           break;
+       }
+    }
+}
+
 /* Try to perform a kind of shrink-wrapping, making sure the
    prologue/epilogue is emitted only around those parts of the
    function that require it.
@@ -666,7 +702,11 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn 
*prologue_seq)
        break;
       }
   if (empty_prologue)
-    return;
+    {
+      if (flag_early_return)
+       try_early_return (entry_edge);
+      return;
+    }
 
   /* Move some code down to expose more shrink-wrapping opportunities.  */
 

Reply via email to