On Tue, Oct 4, 2011 at 8:37 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>> OTOH, x86_64 and i686 targets can also benefit from this change. If >>>> combine can't create more complex address (covered by lea), then it >>>> will simply propagate memory operand back into the add insn. It looks >>>> to me that we can't loose here, so: >>>> >>>> /* Improve address combine. */ >>>> if (code == PLUS && MEM_P (src2)) >>>> src2 = force_reg (mode, src2); >>>> >>>> Any opinions? >>>> >>> >>> It doesn't work with 64bit libstdc++: >> >> Yeah, yeah. ix86_output_mi_thunk has some ... issues. >> >> Please try attached patch that introduces ix86_emit_binop and uses it >> in a bunch of places.
> I tried it on GCC. There are no regressions. The bugs are fixed for x32. > Here are size comparison with GCC runtime libraries on ia32, x32 and > x86-64: > 884093 18600 27064 929757 e2fdd old libstdc++.so > 884189 18600 27064 929853 e303d new libs/libstdc++.so > > The new code is > > mov 0xc(%edi),%eax > mov %eax,0x8(%esi) > mov -0xc(%eax),%eax > mov 0x10(%edi),%edx > lea 0x8(%esi,%eax,1),%eax > > The old one is > > mov 0xc(%edi),%edx > lea 0x8(%esi),%eax > mov %edx,0x8(%esi) > add -0xc(%edx),%eax > mov 0x10(%edi),%edx The new code merged lea+add into one lea, so it looks quite OK to me. Do you have some performance numbers? Thanks, Uros.