On Tue, May 19, 2015 at 1:15 PM, Rich Felker <dal...@libc.org> wrote:
> On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
>> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson <r...@redhat.com> wrote:
>> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
>> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson <r...@redhat.com> 
>> >> wrote:
>> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
>> >>>> I'm still mildly worried that concerns for supporting
>> >>>> relaxation might lead to decisions not to optimize code in ways that
>> >>>> would be difficult to relax (e.g. certain types of address load
>> >>>> reordering or hoisting) but I don't understand GCC internals
>> >>>> sufficiently to know if this concern is warranted or not.
>> >>>
>> >>> It is.  The relaxation that HJ is working on requires that the reads 
>> >>> from the
>> >>> got not be hoisted.  I'm not especially convinced that what he's working 
>> >>> on is
>> >>> a win.
>> >>>
>> >>> With LTO, the compiler can do the same job that he's attempting in the 
>> >>> linker,
>> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
>> >>> you
>> >>> can't hoist the load and hide the memory latency.
>> >>>
>> >>
>> >> My relax approach won't take away any optimization done by compiler.
>> >> It simply turns indirect branch into direct branch with a nop prefix at
>> >> link-time.  I am having a hard time to understand why we shouldn't do it.
>> >
>> > I well understand what you're doing.
>> >
>> > But my point is that the only time the compiler should present you with the
>> > form of indirect branch you're looking for is when there's no place to 
>> > hoist
>> > the load.
>> >
>> > At which point, is it really worth adding a new relocation to the ABI?  Is 
>> > it
>> > really worth adding new code to the linker that won't be exercised often?
>>
>> I believe there are plenty of indirect branches via GOT when compiling
>> PIE/PIC with -fno-plt:
>>
>> [hjl@gnu-6 gcc]$ cat /tmp/x.c
>> extern void foo (void);
>>
>> void
>> bar (void)
>> {
>>   foo ();
>> }
>> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
>> [hjl@gnu-6 gcc]$ cat x.s
>> ..file "x.c"
>> ..section .text.unlikely,"ax",@progbits
>> ..LCOLDB0:
>> ..text
>> ..LHOTB0:
>> ..p2align 4,,15
>> ..globl bar
>> ..type bar, @function
>> bar:
>> ..LFB0:
>> ..cfi_startproc
>> jmp *foo@GOTPCREL(%rip)
>> ..cfi_endproc
>> ..LFE0:
>> ..size bar, .-bar
>
> I agree these exist. What I question is whether the savings from the
> linker being able to relax this to a direct call in the case where the
> programmer failed to let the compiler make it a direct call to begin
> with (by using hidden or protected visibility) are worth the cost of
> not being able to hoist the load out of loops or schedule it earlier
> in cases where relaxation is not possible because the call target is
> not defined in the same DSO.

Just for fun.  I compiled binutils as PIE with -fno-plt -flto:

[hjl@gnu-mic-2 gas]$ file as-new
as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
stripped
[hjl@gnu-mic-2 gas]$

There are 43:

ff 25 21 93 2d 00     jmpq   *0x2d9321(%rip)        # 3d5f58 <_DYNAMIC+0x1e8>

and 1983

ff 15 eb f4 38 00     callq  *0x38f4eb(%rip)        # 3d60e0 <_DYNAMIC+0x370>

-- 
H.J.

Reply via email to