On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:15 PM, Rich Felker <dal...@libc.org> wrote:
> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson <r...@redhat.com> 
> >> wrote:
> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson <r...@redhat.com> 
> >> >> wrote:
> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >> >>>> I'm still mildly worried that concerns for supporting
> >> >>>> relaxation might lead to decisions not to optimize code in ways that
> >> >>>> would be difficult to relax (e.g. certain types of address load
> >> >>>> reordering or hoisting) but I don't understand GCC internals
> >> >>>> sufficiently to know if this concern is warranted or not.
> >> >>>
> >> >>> It is.  The relaxation that HJ is working on requires that the reads 
> >> >>> from the
> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
> >> >>> working on is
> >> >>> a win.
> >> >>>
> >> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >> >>> linker,
> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
> >> >>> that you
> >> >>> can't hoist the load and hide the memory latency.
> >> >>>
> >> >>
> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> >> link-time.  I am having a hard time to understand why we shouldn't do 
> >> >> it.
> >> >
> >> > I well understand what you're doing.
> >> >
> >> > But my point is that the only time the compiler should present you with 
> >> > the
> >> > form of indirect branch you're looking for is when there's no place to 
> >> > hoist
> >> > the load.
> >> >
> >> > At which point, is it really worth adding a new relocation to the ABI?  
> >> > Is it
> >> > really worth adding new code to the linker that won't be exercised often?
> >>
> >> I believe there are plenty of indirect branches via GOT when compiling
> >> PIE/PIC with -fno-plt:
> >>
> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> extern void foo (void);
> >>
> >> void
> >> bar (void)
> >> {
> >>   foo ();
> >> }
> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> [hjl@gnu-6 gcc]$ cat x.s
> >> ..file "x.c"
> >> ..section .text.unlikely,"ax",@progbits
> >> ..LCOLDB0:
> >> ..text
> >> ..LHOTB0:
> >> ..p2align 4,,15
> >> ..globl bar
> >> ..type bar, @function
> >> bar:
> >> ..LFB0:
> >> ..cfi_startproc
> >> jmp *foo@GOTPCREL(%rip)
> >> ..cfi_endproc
> >> ..LFE0:
> >> ..size bar, .-bar
> >
> > I agree these exist. What I question is whether the savings from the
> > linker being able to relax this to a direct call in the case where the
> > programmer failed to let the compiler make it a direct call to begin
> > with (by using hidden or protected visibility) are worth the cost of
> > not being able to hoist the load out of loops or schedule it earlier
> > in cases where relaxation is not possible because the call target is
> > not defined in the same DSO.
> 
> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> 
> [hjl@gnu-mic-2 gas]$ file as-new
> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> stripped
> [hjl@gnu-mic-2 gas]$
> 
> There are 43:
> 
> ff 25 21 93 2d 00     jmpq   *0x2d9321(%rip)        # 3d5f58 <_DYNAMIC+0x1e8>
> 
> and 1983
> 
> ff 15 eb f4 38 00     callq  *0x38f4eb(%rip)        # 3d60e0 <_DYNAMIC+0x370>

How many of those would be relaxed? I suspect it depends a lot on
whether libbfd is static or shared.

Rich

Reply via email to