On Tue, May 19, 2015 at 1:15 PM, Rich Felker <dal...@libc.org> wrote: > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote: >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson <r...@redhat.com> wrote: >> > On 05/19/2015 12:06 PM, H.J. Lu wrote: >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson <r...@redhat.com> >> >> wrote: >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote: >> >>>> I'm still mildly worried that concerns for supporting >> >>>> relaxation might lead to decisions not to optimize code in ways that >> >>>> would be difficult to relax (e.g. certain types of address load >> >>>> reordering or hoisting) but I don't understand GCC internals >> >>>> sufficiently to know if this concern is warranted or not. >> >>> >> >>> It is. The relaxation that HJ is working on requires that the reads >> >>> from the >> >>> got not be hoisted. I'm not especially convinced that what he's working >> >>> on is >> >>> a win. >> >>> >> >>> With LTO, the compiler can do the same job that he's attempting in the >> >>> linker, >> >>> without an extra nop. Without LTO, leaving it to the linker means that >> >>> you >> >>> can't hoist the load and hide the memory latency. >> >>> >> >> >> >> My relax approach won't take away any optimization done by compiler. >> >> It simply turns indirect branch into direct branch with a nop prefix at >> >> link-time. I am having a hard time to understand why we shouldn't do it. >> > >> > I well understand what you're doing. >> > >> > But my point is that the only time the compiler should present you with the >> > form of indirect branch you're looking for is when there's no place to >> > hoist >> > the load. >> > >> > At which point, is it really worth adding a new relocation to the ABI? Is >> > it >> > really worth adding new code to the linker that won't be exercised often? >> >> I believe there are plenty of indirect branches via GOT when compiling >> PIE/PIC with -fno-plt: >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c >> extern void foo (void); >> >> void >> bar (void) >> { >> foo (); >> } >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt >> [hjl@gnu-6 gcc]$ cat x.s >> ..file "x.c" >> ..section .text.unlikely,"ax",@progbits >> ..LCOLDB0: >> ..text >> ..LHOTB0: >> ..p2align 4,,15 >> ..globl bar >> ..type bar, @function >> bar: >> ..LFB0: >> ..cfi_startproc >> jmp *foo@GOTPCREL(%rip) >> ..cfi_endproc >> ..LFE0: >> ..size bar, .-bar > > I agree these exist. What I question is whether the savings from the > linker being able to relax this to a direct call in the case where the > programmer failed to let the compiler make it a direct call to begin > with (by using hidden or protected visibility) are worth the cost of > not being able to hoist the load out of loops or schedule it earlier > in cases where relaxation is not possible because the call target is > not defined in the same DSO.
Just for fun. I compiled binutils as PIE with -fno-plt -flto: [hjl@gnu-mic-2 gas]$ file as-new as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped [hjl@gnu-mic-2 gas]$ There are 43: ff 25 21 93 2d 00 jmpq *0x2d9321(%rip) # 3d5f58 <_DYNAMIC+0x1e8> and 1983 ff 15 eb f4 38 00 callq *0x38f4eb(%rip) # 3d60e0 <_DYNAMIC+0x370> -- H.J.