On 04/17/2013 11:52 PM, Steven Bosscher wrote: > According to the comments in pa.h about MASK_JUMP_IN_DELAY, having > jumps in delay slots of other jumps is one such thing: They don't > bring benefit to the PA-8000 and they don't work with DWARF2 CFI. As > far as I know, SPARC and MIPS don't allow jumps in delay slots, SH > looks like it doesn't allow it either, and CRIS can do it for short > branches but doesn't do because the trade-off between benefit and > machine description complexity comes out negative. On the scheduler > implementation side: Branches as delayed insns in delay slots of other > branches is impossible to express in the CFG (at least in GCC, but I > think in general it can't be done cleanly). Therefore I want to drop > support for branches in delay slots. What do you think about this?
I thought I'd mention C6X allows branches in delay slots (of course reorg.c isn't involved in this). This is useful if one can prove that the first branch isn't taken if the predicate of the second branch is true. Otherwise, it has the same semantics as the PA (as described by Jeff): you get to execute a few instructions at the first branch target and then another jump happens away from there. This can be useful, for example to implement short loops (without using the hardware loop mechanisms) by scheduling a decrement/branch every cycle for 6 cycles before the actual loop, but gcc does not use this functionality. > What about multiple delay slots? It looks like reorg.c has code to > handle insns with multiple delay slots, but there currently are no GCC > targets in the FSF tree that have insns with multiple delay slots and > that use define_delay. The C6X has many more delay slots than just 1 > (it can have up to 5 delay slots IIRC) 5 cycles with up to 8 insns each :) Didn't want to try that with reorg.c. > but it is much more flexible > than traditional RISCs when it comes to putting insns in delay slots > (it uses predication so it can annul delayed insns on various > conditions) and it uses a very clever (and effective??) delay slot > filling mechanism via the normal scheduler, using back-tracking and > "jump shadows" (see UNSPEC_JUMP_SHADOW in the cx6 back end). But C6X > doesn't use reorg.c delay slot scheduling. I'm not aware of any > non-VLIW, non-DSP targets with more than one delay slot per insn, and > new VLIW/DSP ports with delay slots probably should look at c6x rather > than using define_delay. Supporting only a single delay slot per > delay_insn would make my scheduler a bit less complex. Would that be > enough for everyone, or is it necessary to continue to support > multiple delay slots per insn? The mechanism used for C6X has the advantage of using the pipeline description for accurate schedules and allowing more than one delay slot. It can also add predication to fallthrough insns to make them suitable for use in a delay slot. The downside is that it doesn't know quite as many tricks as reorg.c. It's based on sched-ebb so it can only take instructions from the fallthrough branch (something I've wanted to fix but never had the time). In general I think if a new target wants more than one delay slot, it should try to use the C6X method instead of reorg.c. It would be nice for someone to try it on a target like mips or PA as well; ISTR Richard S was going to try at some point but I don't know if anything came of that. I expect it to generate worse code than reorg.c at this stage but improvements should be possible. Bernd