https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117111
--- Comment #2 from Kazumoto Kojima <kkojima at gcc dot gnu.org> --- dbr_schedule is trying to fill the delay slot of (jump_insn 17 16 42 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0 [0])) (label_ref:SI 94) (pc))) "fpone.c":7:11 232 {*cbranch_t} (int_list:REG_BR_PROB 536870916 (nil)) -> 94) by fill_slots_from_thread function. It takes the insn for the slot from the thread (insn 46 39 79 (set (reg:SI 8 r8 [orig:167 _1 ] [167]) (reg:SI 147 t)) "fpone.c":7:11 303 {movt} (expr_list:REG_DEAD (reg:SI 147 t) (nil))) (insn 79 46 80 (set (reg/i:SI 0 r0) (reg:SI 8 r8 [orig:167 _1 ] [167])) "fpone.c":8:1 191 {movsi_ie} (expr_list:REG_DEAD (reg:SI 8 r8 [orig:167 _1 ] [167]) (nil))) and makes a candidate insn (insn 79 46 80 (set (reg/i:SI 0 r0) (reg:SI 147 t)) "fpone.c":8:1 303 {movt} (expr_list:REG_DEAD (reg:SI 8 r8 [orig:167 _1 ] [167]) (nil))) for trial. fill_slots_from_thread calls try_split for this candidate first. Here is a gdb backtrace where trial is the insn 79 above. #0 try_split (pat=pat@entry=0x7ffff6f5ec48, trial=trial@entry=0x7ffff6f5d400, last=last@entry=0) at /git/gcc/gcc/emit-rtl.cc:3932 #1 0x0000000000f5a748 in fill_slots_from_thread (insn=0x7ffff6e0cd38, condition=0x7ffff6f5f0d8, thread_or_return=<optimized out>, opposite_thread=<optimized out>, likely=false, thread_if_true=<optimized out>, own_thread=true, slots_to_fill=<optimized out>, pslots_filled=0x7fffffffd7ac, delay_list=0x7fffffffd7d0) at /git/gcc/gcc/reorg.cc:2430 #2 0x0000000000f5d381 in fill_eager_delay_slots () at /git/gcc/gcc/reorg.cc:2843 #3 dbr_schedule (first=<optimized out>) at /git/gcc/gcc/reorg.cc:3705 and try_split applies the splitter sh.md: 11147 ;; This is not a peephole, but it's here because it's actually supposed ;; to be one. It tries to convert a sequence such as ;; movt r2 -> movt r2 ;; movt r13 mov r2,r13 ;; This gives the schduler a bit more freedom to hoist a following ;; comparison insn. Moreover, it the reg-reg mov insn is MT group which has ;; better chances for parallel execution. ;; We can do this with a peephole2 pattern, but then the cprop_hardreg ;; pass will revert the change. See also PR 64331. ;; Thus do it manually in one of the split passes after register allocation. ;; Sometimes the cprop_hardreg pass might also eliminate the reg-reg copy. (define_split [(set (match_operand:SI 0 "arith_reg_dest") (match_operand:SI 1 "t_reg_operand"))] ... and returns (insn 113 46 80 (set (reg/i:SI 0 r0) (reg:SI 8 r8 [orig:167 _1 ] [167])) "fpone.c":8:1 -1 (nil)) Thus fill_slots_from_thread fills the slot with it. This situation would be unexpected with both fill_slots_from_thread and the above splitter. BTW, old RA makes a bit worse code for the thread: .L6: movt r1 mov.l r1,@r15 mov.l @r15,r0 add #4,r15 lds.l @r15+,pr and the splitter cloudn't be applied.