http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40977
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2014-02-07 CC| |law at redhat dot com Known to work| | Assignee|unassigned at gcc dot gnu.org |law at redhat dot com Ever confirmed|0 |1 Known to fail| | --- Comment #8 from Jeffrey A. Law <law at redhat dot com> --- The current trunk looks better than gcc-4.4, but it's still not as good as gcc-3.4 After reload the key insns like this: (insn 25 24 28 6 (set (reg:DI 0 %d0 [orig:47 D.1386 ] [47]) (ashift:DI (zero_extend:DI (reg/v:SI 8 %a0 [orig:31 resh ] [31])) (const_int 32 [0x20]))) l.c:54 302 {ashldi_extsi} (nil)) (note 28 25 43 6 NOTE_INSN_DELETED) (insn 43 28 44 6 (set (reg:SI 0 %d0) (reg:SI 0 %d0 [ D.1386 ])) l.c:57 39 {*movsi_m68k2} (nil)) (insn 44 43 36 6 (set (reg:SI 1 %d1 [orig:0+4 ] [0]) (reg:SI 6 %d6 [orig:44 resl ] [44])) l.c:57 39 {*movsi_m68k2} (nil)) You can safely ignore insn 43, it'll get zapped because it's a NOP. The key here is to realize that insn 25 generates two instructions, one which sets d0, the other sets d1. The instruction setting d1 is dead as that value will be overwritten by the instruction generated for insn 44. But GCC is particularly bad at discovering and exploiting these kind of situations. This can be fixed by changing ashldi_extsi from a define_insn into a suitable define_insn_and_split which will decompose the insn into its component parts. That gets us something like this: (insn 49 24 50 6 (set (reg:SI 0 %d0 [ D.1386 ]) (reg/v:SI 8 %a0 [orig:31 resh ] [31])) l.c:54 38 {*movsi_m68k} (nil)) (insn 50 49 28 6 (set (reg:SI 1 %d1 [orig:47 D.1386+4 ] [47]) (const_int 0 [0])) l.c:54 36 {*movsi_const0_68040_60} (nil)) (note 28 50 44 6 NOTE_INSN_DELETED) (insn 44 28 36 6 (set (reg:SI 1 %d1 [orig:0+4 ] [0]) (reg:SI 6 %d6 [orig:44 resl ] [44])) l.c:57 39 {*movsi_m68k2} (nil)) Now the double-word set originally associated with insn 25 is represented by insns 49 and 50. And we're in a form that the DCE code can easily digest and determine that insn 50 is dead. This results in: (insn 49 24 28 6 (set (reg:SI 0 %d0 [ D.1386 ]) (reg/v:SI 8 %a0 [orig:31 resh ] [31])) l.c:54 38 {*movsi_m68k} (expr_list:REG_DEAD (reg/v:SI 8 %a0 [orig:31 resh ] [31]) (nil))) (note 28 49 44 6 NOTE_INSN_DELETED) (insn 44 28 36 6 (set (reg:SI 1 %d1 [orig:0+4 ] [0]) (reg:SI 6 %d6 [orig:44 resl ] [44])) l.c:57 39 {*movsi_m68k2} (expr_list:REG_DEAD (reg:SI 6 %d6 [orig:44 resl ] [44]) Which is, much better. The final assembly code looks like: MUL64: movem.l #15872,-(%sp) move.l 24(%sp),%a1 move.l 28(%sp),%d5 #APP | 47 "l.c" 1 | Inlined umul_ppmm move.l %a1,%d0 move.l %d5,%d1 move.l %d0,%d2 swap %d0 move.l %d1,%d3 swap %d1 move.w %d2,%d4 mulu %d3,%d4 mulu %d1,%d2 mulu %d0,%d3 mulu %d0,%d1 move.l %d4,%d0 eor.w %d0,%d0 swap %d0 add.l %d0,%d2 add.l %d3,%d2 jcc 1f add.l #65536,%d1 1: swap %d2 moveq #0,%d0 move.w %d2,%d0 move.w %d4,%d2 move.l %d2,%d6 add.l %d1,%d0 move.l %d0,%a0 #NO_APP tst.l %a1 jlt .L6 tst.l %d5 jlt .L7 .L3: move.l %a0,%d0 move.l %d6,%d1 movem.l (%sp)+,#124 rts .L7: sub.l %a1,%a0 move.l %a0,%d0 move.l %d6,%d1 movem.l (%sp)+,#124 rts .L6: sub.l %d5,%a0 tst.l %d5 jge .L3 jra .L7 Which should be as good as or better than the gcc-3.4 code, with the possible exception of codesize. But the compiler has tried to optimize the most likely path through the function (neither argument is negative). As a result we have a bit of tail duplication.