> Are you trying to say that you have the option as to what kind of 
> branch to use?  ie, "ordinary", presumably without a delay slot or one 
> with a delay slot?

> Is the "ordinary" actually just a nullified delay slot or some form of 
> likely/not likely static hint?

Specifically for MIPSR6: the ISA possesses traditional delay slot branches and
a normal branch (no delay slots, not annulling, no hints, subtle static hazard),
aka "compact branch" in MIPS terminology. They could be described as nullify
on taken delay slot branch but we saw little to no value in that.

Matthew Fortune provided a writeup with their handling in GCC: 

https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html

> But what is the compact form at the micro-architectural level?  My
> mips-fu has diminished greatly, but my recollection is the bubble is
> always there.   Is that not the case?

The pipeline bubble will exist but the performance impact varies across
R6 cores. High-end OoO cores won't be impacted as much, but lower
end cores will. microMIPSR6 removes delay slot branches altogether which
pushes the simplest micro-architectures to optimize away the cost of a
pipeline bubble.

For non-microMIPSR6 this is why we have different branch policies implemented
in the MIPS backend to allow branch usage to be tuned. By default, if a delay
slot can be filled then we use a delay slot branch otherwise we use a compact
branch as the only thing in the DS would be a NOP anyway.

Compact branches do a strange restriction in that they cannot be followed by a 
CTI. This is to simplify branch predictors apparently but this may be lifted in
future ISA releases.

> If it is able to find insns from the commonly executed path that don't 
> have a long latency, then the fill is usually profitable (since the 
> pipeline bubble always exists).  However, pulling a long latency 
> instruction (say anything that might cache miss or an fdiv/fsqrt) off 
> the slow path and conditionally nullifying it can be *awful*.
> Everything else is in-between.

I agree. The variability in profit/loss in a concern and I see two ways to deal
with it:

A) modify the delay slot filler so that it choses speculative instructions of 
less than some $cost and avoid instruction duplication when the eager filler
picks an instruction from a block with multiple predecessors. Making such
changes would be invasive and require more target specific hooks.

B) Use compact branches instead of speculative delay slot execution and forsake
variable performance for a consistent pipeline bubble by not using the
speculative delay filler altogether.

Between these two choices, B seems to better option as due to sheer simplicity.
Choosing neither gives speculative instruction execution when there could be a
small consistent penalty instead.

Thanks,
Simon
________________________________________
From: Jeff Law [l...@redhat.com]
Sent: 17 September 2015 17:55
To: Simon Dardis; Bernd Schmidt
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Target hook for disabling the delay slot filler.

On 09/17/2015 03:52 AM, Simon Dardis wrote:
> The profitability of using an ordinary branch over a delay slot branch 
> depends on how the delay slot is filled. If a delay slot can be filled 
> from an instruction preceding the branch or instructions proceeding 
> that must be executed on both sides then it is profitable to use a delay slot 
> branch.
Agreed.  It's an over-simplification, but for the purposes of this discussion 
it's close enough.


>
> For cases when instructions are chosen from one side of the branch, 
> the proposed optimization strategy is to not speculatively execute 
> instructions when ordinary branches could be used. Performance-wise 
> this avoids executing instructions which the eager delay filler picked 
> wrongly.
Are you trying to say that you have the option as to what kind of branch to 
use?  ie, "ordinary", presumably without a delay slot or one with a delay slot?

Is the "ordinary" actually just a nullified delay slot or some form of 
likely/not likely static hint?



>
> Since most branches have a compact form disabling the eager delay 
> filler should be no worse than altering it not to fill delay slots in this 
> case.
But what is the compact form at the micro-architectural level?  My mips-fu has 
diminished greatly, but my recollection is the bubble is
always there.   Is that not the case?

fill_eager_delay_slots is most definitely speculative and its profitability is 
largely dependent on the cost of what insns it finds to fill those delay slots 
and whether they're from the common or uncommon path.

If it is able to find insns from the commonly executed path that don't have a 
long latency, then the fill is usually profitable (since the pipeline bubble 
always exists).  However, pulling a long latency instruction (say anything that 
might cache miss or an fdiv/fsqrt) off the slow path and conditionally 
nullifying it can be *awful*.
Everything else is in-between.



Jeff

Reply via email to