https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79593
--- Comment #21 from Jeffrey A. Law <law at redhat dot com> ---
We have this after IRA:
(insn 27 26 28 4 (set (reg:DI 101 [ pretmp_22 ])
(zero_extend:DI (subreg:SI (reg:SF 91 [ pretmp_22 ]) 0))) "j.C":20:35
114 {*zero_extendsidi2}
(expr_list:REG_DEAD (reg:SF 91 [ pretmp_22 ])
(nil)))
(insn 28 27 29 4 (set (reg:XF 100)
(float:XF (reg:DI 101 [ pretmp_22 ]))) "j.C":20:35 169 {floatdixf2}
(expr_list:REG_DEAD (reg:DI 101 [ pretmp_22 ])
(nil)))
Where 91 and 101 will get assigned to memory locations because of the 'm'
constraint for floatdixf2. r100 gets a hard register. We're going to need a
reload for insn 27. So after LRA we have:
(insn 100 26 27 4 (set (reg:SI 0 ax [110])
(mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])) "j.C":20:35 67
{*movsi_internal}
(nil))
(insn 27 100 28 4 (set (mem/c:DI (reg/f:SI 7 sp) [6 %sfp+-8 S8 A64])
(zero_extend:DI (reg:SI 0 ax [110]))) "j.C":20:35 114
{*zero_extendsidi2}
(nil))
[ insn 28 doesn't really play a role here other than requiring the 'm'
operand]
The x86 backend has a splitter to optimize insn 27. So post LRA splitting
generates:
(insn 100 26 107 4 (set (reg:SI 0 ax [110])
(mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])) "j.C":20:35 67
{*movsi_internal}
(nil))
(insn 107 100 108 4 (set (mem/c:SI (reg/f:SI 7 sp) [6 %sfp+-8 S4 A64])
(reg:SI 0 ax [110])) "j.C":20:35 67 {*movsi_internal}
(nil))
(insn 108 107 28 4 (set (mem/c:SI (plus:SI (reg/f:SI 7 sp)
(const_int 4 [0x4])) [6 %sfp+-4 S4 A32])
(const_int 0 [0])) "j.C":20:35 67 {*movsi_internal}
(nil))
Now we've finally exposed the redundancy. This can be addressed in DSE2
which runs after SPLIT2. But it's not all that generally effective. Figure
we're getting ~8 hits per stage during a bootstrap -- all in the runtime
system.
I looked at perhaps trying to detect the partial dead store in postreload-gcse.
THere's a lot of good memory tracking bits in here, but it's still not a good
fit.
It doesn't really feel like an IRA/LRA problem to me. Their decisions are sane
AFAICT.
We could try and catch it with a new peephole pattern, but that seems even less
desirable than detecting this in a generic way during DSE.