> Are you trying to say that you have the option as to what kind of > branch to use? ie, "ordinary", presumably without a delay slot or one > with a delay slot?
> Is the "ordinary" actually just a nullified delay slot or some form of > likely/not likely static hint? Specifically for MIPSR6: the ISA possesses traditional delay slot branches and a normal branch (no delay slots, not annulling, no hints, subtle static hazard), aka "compact branch" in MIPS terminology. They could be described as nullify on taken delay slot branch but we saw little to no value in that. Matthew Fortune provided a writeup with their handling in GCC: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01892.html > But what is the compact form at the micro-architectural level? My > mips-fu has diminished greatly, but my recollection is the bubble is > always there. Is that not the case? The pipeline bubble will exist but the performance impact varies across R6 cores. High-end OoO cores won't be impacted as much, but lower end cores will. microMIPSR6 removes delay slot branches altogether which pushes the simplest micro-architectures to optimize away the cost of a pipeline bubble. For non-microMIPSR6 this is why we have different branch policies implemented in the MIPS backend to allow branch usage to be tuned. By default, if a delay slot can be filled then we use a delay slot branch otherwise we use a compact branch as the only thing in the DS would be a NOP anyway. Compact branches do a strange restriction in that they cannot be followed by a CTI. This is to simplify branch predictors apparently but this may be lifted in future ISA releases. > If it is able to find insns from the commonly executed path that don't > have a long latency, then the fill is usually profitable (since the > pipeline bubble always exists). However, pulling a long latency > instruction (say anything that might cache miss or an fdiv/fsqrt) off > the slow path and conditionally nullifying it can be *awful*. > Everything else is in-between. I agree. The variability in profit/loss in a concern and I see two ways to deal with it: A) modify the delay slot filler so that it choses speculative instructions of less than some $cost and avoid instruction duplication when the eager filler picks an instruction from a block with multiple predecessors. Making such changes would be invasive and require more target specific hooks. B) Use compact branches instead of speculative delay slot execution and forsake variable performance for a consistent pipeline bubble by not using the speculative delay filler altogether. Between these two choices, B seems to better option as due to sheer simplicity. Choosing neither gives speculative instruction execution when there could be a small consistent penalty instead. Thanks, Simon ________________________________________ From: Jeff Law [l...@redhat.com] Sent: 17 September 2015 17:55 To: Simon Dardis; Bernd Schmidt Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Target hook for disabling the delay slot filler. On 09/17/2015 03:52 AM, Simon Dardis wrote: > The profitability of using an ordinary branch over a delay slot branch > depends on how the delay slot is filled. If a delay slot can be filled > from an instruction preceding the branch or instructions proceeding > that must be executed on both sides then it is profitable to use a delay slot > branch. Agreed. It's an over-simplification, but for the purposes of this discussion it's close enough. > > For cases when instructions are chosen from one side of the branch, > the proposed optimization strategy is to not speculatively execute > instructions when ordinary branches could be used. Performance-wise > this avoids executing instructions which the eager delay filler picked > wrongly. Are you trying to say that you have the option as to what kind of branch to use? ie, "ordinary", presumably without a delay slot or one with a delay slot? Is the "ordinary" actually just a nullified delay slot or some form of likely/not likely static hint? > > Since most branches have a compact form disabling the eager delay > filler should be no worse than altering it not to fill delay slots in this > case. But what is the compact form at the micro-architectural level? My mips-fu has diminished greatly, but my recollection is the bubble is always there. Is that not the case? fill_eager_delay_slots is most definitely speculative and its profitability is largely dependent on the cost of what insns it finds to fill those delay slots and whether they're from the common or uncommon path. If it is able to find insns from the commonly executed path that don't have a long latency, then the fill is usually profitable (since the pipeline bubble always exists). However, pulling a long latency instruction (say anything that might cache miss or an fdiv/fsqrt) off the slow path and conditionally nullifying it can be *awful*. Everything else is in-between. Jeff