> -----Original Message-----
> From: Jeff Law [mailto:[email protected]]
> Sent: 08 October 2015 20:44
> To: Simon Dardis; Bernd Schmidt
> Cc: [email protected]
> Subject: Re: FW: [PATCH] Target hook for disabling the delay slot filler.
>
> On 09/18/2015 05:10 AM, Simon Dardis wrote:
> >> Are you trying to say that you have the option as to what kind of
> >> branch to use? ie, "ordinary", presumably without a delay slot or
> >> one with a delay slot?
> >
> >> Is the "ordinary" actually just a nullified delay slot or some form
> >> of likely/not likely static hint?
> >
> > Specifically for MIPSR6: the ISA possesses traditional delay slot
> > branches and a normal branch (no delay slots, not annulling, no hints,
> > subtle static hazard), aka "compact branch" in MIPS terminology. They
> > could be described as nullify on taken delay slot branch but we saw little
> > to
> no value in that.
> >
> > Matthew Fortune provided a writeup with their handling in GCC:
> >
> > https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html
> Thanks. I never looked at that message, almost certainly because it was MIPS
> specific. I'm trying hard to stay out of backends that have good active
> maintainers, and MIPS certainly qualifies on that point.
>
>
> >
> >> But what is the compact form at the micro-architectural level? My
> >> mips-fu has diminished greatly, but my recollection is the bubble is
> >> always there. Is that not the case?
> >
> > The pipeline bubble will exist but the performance impact varies
> > across
> > R6 cores. High-end OoO cores won't be impacted as much, but lower end
> > cores will. microMIPSR6 removes delay slot branches altogether which
> > pushes the simplest micro-architectures to optimize away the cost of a
> > pipeline bubble.
> [ ... snip more micro-archticture stuff ... ] Thanks. That helps a lot. I
> didn't
> realize the bubble was being squashed to varying degrees. And FWIW, I
> wouldn't be surprised if you reach a point on the OoO cores where you'll just
> want to move away from delay slots totally and rely on your compact
> branches as much as possible. It may give your hardware guys a degree of
> freedom that helps them in the common case (compact branches) at the
> expense of slowing down code with old fashioned delay slots.
>
> > Compact branches do a strange restriction in that they cannot be
> > followed by a CTI. This is to simplify branch predictors apparently
> > but this may be lifted in future ISA releases.
> Come on! :-) There's some really neat things you can do when you allow
> branches in delay slots. The PA was particularly fun in that regard.
> My recollection is HP had some hand written assembly code in their libraries
> which exploited the out-of-line execution you could get in this case. We
> never tried to exploit in GCC simply because the opportunities didn't see all
> that common or profitable.
>
>
>
> >
> >> If it is able to find insns from the commonly executed path that
> >> don't have a long latency, then the fill is usually profitable (since
> >> the pipeline bubble always exists). However, pulling a long latency
> >> instruction (say anything that might cache miss or an fdiv/fsqrt) off
> >> the slow path and conditionally nullifying it can be *awful*.
> >> Everything else is in-between.
> >
> > I agree. The variability in profit/loss in a concern and I see two
> > ways to deal with it:
> >
> > A) modify the delay slot filler so that it choses speculative
> > instructions of less than some $cost and avoid instruction duplication
> > when the eager filler picks an instruction from a block with multiple
> > predecessors. Making such changes would be invasive and require more
> target specific hooks.
> The cost side here should be handled by existing mechanisms. You just
> never allow anything other than simple arith, logicals & copies.
>
> You'd need a hook to avoid this when copying was needed.
>
> You'd probably also need some kind of target hook to indicate the level of
> prediction where this is profitable since the cost varies across your micro-
> architectures.
>
> And you'd also have to worry about the special code which triggers when
> there's a well predicted branch, but a resource conflict. In that case reorg
> can
> fill the slot from the predicted path and insert compensation code on the
> non-predicted path.
>
>
>
> >
> > B) Use compact branches instead of speculative delay slot execution
> > and forsake variable performance for a consistent pipeline bubble by
> > not using the speculative delay filler altogether.
> >
> > Between these two choices, B seems to better option as due to sheer
> simplicity.
> > Choosing neither gives speculative instruction execution when there
> > could be a small consistent penalty instead.
> B is certainly easier.
>
> The main objection I had was given my outdated knowledge of the MIPS
> processors it seemed like you were taking a step backwards. You've cleared
> that up and if you're comfortable with the tradeoff, then I won't object to
> the target hook to disable eager filling.
>
> Can you repost that patch? Given I was the last one to do major work on
> reorg (~20 years ago mind you) it probably makes the most sense for me to
> own the review.
>
> jeff
>
Thanks for getting back to me.
Patch below. Target hook renamed to TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P.
Tested on mips-img-elf, no new regressions.
Thanks,
Simon
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c (revision 229123)
+++ gcc/config/mips/mips.c (working copy)
@@ -4312,6 +4312,14 @@
{
return mips_address_insns (addr, mode, false);
}
+
+/* Implement TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P. */
+
+static bool
+mips_no_speculation_in_delay_slots_p ()
+{
+ return TARGET_CB_MAYBE;
+}
/* Information about a single instruction in a multi-instruction
asm sequence. */
@@ -19823,6 +19831,9 @@
#undef TARGET_ADDRESS_COST
#define TARGET_ADDRESS_COST mips_address_cost
+#undef TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
+#define TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
mips_no_speculation_in_delay_slots_p
+
#undef TARGET_IN_SMALL_DATA_P
#define TARGET_IN_SMALL_DATA_P mips_in_small_data_p
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi (revision 229123)
+++ gcc/doc/tm.texi (working copy)
@@ -6444,6 +6444,16 @@
registers on machines with lots of registers.
@end deftypefn
+@deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
+This predicate controls the use of the eager delay slot filler to disallow
+speculatively executed instructions being placed in delay slots. Targets
+such as certain MIPS architectures possess both branches with and without
+delay slots. As the eager delay slot filler can decrease performance,
+disabling it is beneficial when ordinary branches are available. Use of
+delay slot branches filled using the basic filler is often still desirable
+as the delay slot can hide a pipeline bubble.
+@end deftypefn
+
@node Scheduling
@section Adjusting the Instruction Scheduler
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in (revision 229123)
+++ gcc/doc/tm.texi.in (working copy)
@@ -4724,6 +4724,8 @@
@hook TARGET_ADDRESS_COST
+@hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
+
@node Scheduling
@section Adjusting the Instruction Scheduler
Index: gcc/reorg.c
===================================================================
--- gcc/reorg.c (revision 229123)
+++ gcc/reorg.c (working copy)
@@ -3747,7 +3747,8 @@
{
fill_simple_delay_slots (1);
fill_simple_delay_slots (0);
- fill_eager_delay_slots ();
+ if (!targetm.no_speculation_in_delay_slots_p ())
+ fill_eager_delay_slots ();
relax_delay_slots (first);
}
Index: gcc/target.def
===================================================================
--- gcc/target.def (revision 229123)
+++ gcc/target.def (working copy)
@@ -3498,6 +3498,19 @@
int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
default_address_cost)
+/* Permit speculative instructions in delay slots during delayed-branch
+ scheduling. */
+DEFHOOK
+(no_speculation_in_delay_slots_p,
+ "This predicate controls the use of the eager delay slot filler to disallow\n\
+speculatively executed instructions being placed in delay slots. Targets\n\
+such as certain MIPS architectures possess both branches with and without\n\
+delay slots. As the eager delay slot filler can decrease performance,\n\
+disabling it is beneficial when ordinary branches are available. Use of\n\
+delay slot branches filled using the basic filler is often still desirable\n\
+as the delay slot can hide a pipeline bubble.", bool, (void),
+ hook_bool_void_false)
+
/* Return where to allocate pseudo for a given hard register initial value. */
DEFHOOK
(allocate_initial_value,
Index: gcc/testsuite/gcc.target/mips/ds-schedule-1.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-1.c (revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-1.c (working copy)
@@ -0,0 +1,29 @@
+/* { dg-options "isa_rev>=6 -mcompact-branches=optimal -mno-abicalls -G4" } */
+/* { dg-final { scan-assembler-not "bne\t" } } */
+/* { dg-final { scan-assembler-not "beq\t" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 1 } } */
+
+/* Test that when compact branches are used, that a compact branch is
+ produced in the case where code expansion would have occurred if a
+ delay slot branch would have be used. 'foo' should only be
+ referenced once in the program text. */
+
+struct list
+{
+ struct list *next;
+ int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+ if (gr)
+ *ptr = gr->next;
+ t (1, foo, &gr->element);
+}
Index: gcc/testsuite/gcc.target/mips/ds-schedule-2.c
===================================================================
--- gcc/testsuite/gcc.target/mips/ds-schedule-2.c (revision 0)
+++ gcc/testsuite/gcc.target/mips/ds-schedule-2.c (working copy)
@@ -0,0 +1,28 @@
+/* { dg-options "-mcompact-branches=never -mno-abicalls -G4" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-O1" "-Os" } { "" } } */
+/* { dg-final { scan-assembler "beq.*\n\tlw" } } */
+/* { dg-final { scan-assembler-times "\\(foo\\)" 2 } } */
+
+/* Test that when compact branches are explicitly disabled, that a non-compact
+ branch is produced. 'foo' should be referenced twice in the program text as
the
+ eager delay slot filler will duplicate the load of foo. */
+
+struct list
+{
+ struct list *next;
+ int element;
+};
+
+struct list *gr;
+
+int foo;
+
+extern void t (int, int, int*);
+
+void
+f (struct list **ptr)
+{
+ if (gr)
+ *ptr = gr->next;
+ t (1, foo, &gr->element);
+}