Re: Relocations to use when eliding plts

H.J. Lu Wed, 27 May 2015 17:45:20 -0700

On Wed, May 27, 2015 at 1:03 PM, Richard Henderson <r...@redhat.com> wrote:
> There's one problem with the couple of patches that I've seen go by wrt 
> eliding
> PLTs with -z now, and relaxing inlined PLTs (aka -fno-plt):
>
> They're currently using the same relocations used by data, and thus the linker
> and dynamic linker must ensure that pointer equality is maintained.  Which
> results in branch-to-branch-(to-branch) situations.
>
> E.g. the attached test case, in which main has a plt entry for function A in
> a.so, and the function B in b.so calls A.
>
> $ LD_BIND_NOW=1 gdb main
> ...
> (gdb) b b
> Breakpoint 1 at 0x400540
> (gdb) run
> Starting program: /home/rth/x/main
> Breakpoint 1, b () at b.c:2
> 2       void b(void) { a(); }
> (gdb) si
> 2       void b(void) { a(); }
> => 0x7ffff7bf75f4 <b+4>:        callq  0x7ffff7bf74e0
> (gdb)
> 0x00007ffff7bf74e0 in ?? () from ./b.so
> => 0x7ffff7bf74e0:      jmpq   *0x20034a(%rip)        # 0x7ffff7df7830
> (gdb)
> 0x0000000000400560 in a@plt ()
> => 0x400560 <a@plt>:    jmpq   *0x20057a(%rip)        # 0x600ae0
> (gdb)
> a () at a.c:2
> 2       void a() { printf("Hello, World!\n"); }
> => 0x7ffff7df95f0 <a>:  sub    $0x8,%rsp
>
>
> If we use -fno-plt, we eliminate the first callq, but do still have two
> consecutive jmpq's.
>
> If seems to me that we ought to have different relocations when we're only
> going to use a pointer for branching, and when we need a pointer to be
> canonicalized for pointer comparisons.
>
> In the linked image, we already have these: R_X86_64_GLOB_DAT vs
> R_X86_64_JUMP_SLOT.  Namely, GLOB_DAT implies "data" (and therefore pointer
> equality), while JUMP_SLOT implies "code" (and therefore we can resolve past
> plt stubs in the main executable).
>
> Which means that HJ's patch of May 16 (git hash 25070364), is less than ideal.
>  I do like the smaller PLT entries, but I don't like the fact that it now 
> emits
> GLOB_DAT for the relocations instead of JUMP_SLOT.


ld.so just does whatever is arranged by ld.  I am not sure change ld.so
is a good idea.  I don't what kind of optimization we can do when function
is called and its address it taken.

>
> In the relocatable image, when we're talking about -fno-plt, we should think
> about what relocation we'd like to emit.  Yes, the existing R_X86_64_GOTPCREL
> works with existing toolchains, and there's something to be said for that.
> However, if we're talking about adding a new relocation for relaxing an
> indirect call via GOTPCREL, then:
>
> If we want -fno-plt to be able to hoist function addresses, then we're going 
> to
> want the address that we load for the call to also not be subject to possible
> jump-to-jump.
>
> Unless we want the linker to do an unreasonable amount of x86 code examination
> in order to determine mov vs call for relaxation, we need two different
> relocations (preferably using the same assembler mnemonic, and thus the 
> correct
> relocation is enforced by the assembler).
>
> On the users/hjl/relax branch (and posted on list somewhere), the new
> relocation is called R_X86_64_RELAX_GOTPCREL.  I'm not keen on that "relax"
> name, despite that being exactly what it's for.
>
> I suggest R_X86_64_GOTPLTPCREL_{CALL,LOAD} for the two relocation names.  That
> is, the address is in the .got.plt section, it's a pc-relative relocation, and
> it's being used by a call or load (mov) insn.

Since it is used for indirect call, how about R_X86_64_INBR_GOTPCREL?

I updated users/hjl/relax branch to covert relocation in *foo@GOTPCREL(%rip)
from R_X86_64_GOTPCREL to R_X86_64_RELAX_GOTPCREL so that
existing assembly code works automatically with a new binutils.

> With those two, we can fairly easily relax call/jmp to direct branches, and 
> mov
> to lea.  Yes, LTO can perform the same optimization, but I'll also agree that
> there are many projects for which LTO is both overkill and unworkable.
>
> This does leave open other optimization questions, mostly around weak
> functions.  Consider constructs like
>
>         if (foo) foo();
>
> Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the
> possibility (not certainty) of jump-to-jump but definitely avoiding a separate
> load insn and the latency implied by that?
>
>
> Comments?
>
>
> r~



-- 
H.J.

Re: Relocations to use when eliding plts

Reply via email to