On Wed, May 27, 2015 at 1:03 PM, Richard Henderson <r...@redhat.com> wrote: > There's one problem with the couple of patches that I've seen go by wrt > eliding > PLTs with -z now, and relaxing inlined PLTs (aka -fno-plt): > > They're currently using the same relocations used by data, and thus the linker > and dynamic linker must ensure that pointer equality is maintained. Which > results in branch-to-branch-(to-branch) situations. > > E.g. the attached test case, in which main has a plt entry for function A in > a.so, and the function B in b.so calls A. > > $ LD_BIND_NOW=1 gdb main > ... > (gdb) b b > Breakpoint 1 at 0x400540 > (gdb) run > Starting program: /home/rth/x/main > Breakpoint 1, b () at b.c:2 > 2 void b(void) { a(); } > (gdb) si > 2 void b(void) { a(); } > => 0x7ffff7bf75f4 <b+4>: callq 0x7ffff7bf74e0 > (gdb) > 0x00007ffff7bf74e0 in ?? () from ./b.so > => 0x7ffff7bf74e0: jmpq *0x20034a(%rip) # 0x7ffff7df7830 > (gdb) > 0x0000000000400560 in a@plt () > => 0x400560 <a@plt>: jmpq *0x20057a(%rip) # 0x600ae0 > (gdb) > a () at a.c:2 > 2 void a() { printf("Hello, World!\n"); } > => 0x7ffff7df95f0 <a>: sub $0x8,%rsp > > > If we use -fno-plt, we eliminate the first callq, but do still have two > consecutive jmpq's. > > If seems to me that we ought to have different relocations when we're only > going to use a pointer for branching, and when we need a pointer to be > canonicalized for pointer comparisons. > > In the linked image, we already have these: R_X86_64_GLOB_DAT vs > R_X86_64_JUMP_SLOT. Namely, GLOB_DAT implies "data" (and therefore pointer > equality), while JUMP_SLOT implies "code" (and therefore we can resolve past > plt stubs in the main executable). > > Which means that HJ's patch of May 16 (git hash 25070364), is less than ideal. > I do like the smaller PLT entries, but I don't like the fact that it now > emits > GLOB_DAT for the relocations instead of JUMP_SLOT.
ld.so just does whatever is arranged by ld. I am not sure change ld.so is a good idea. I don't what kind of optimization we can do when function is called and its address it taken. > > In the relocatable image, when we're talking about -fno-plt, we should think > about what relocation we'd like to emit. Yes, the existing R_X86_64_GOTPCREL > works with existing toolchains, and there's something to be said for that. > However, if we're talking about adding a new relocation for relaxing an > indirect call via GOTPCREL, then: > > If we want -fno-plt to be able to hoist function addresses, then we're going > to > want the address that we load for the call to also not be subject to possible > jump-to-jump. > > Unless we want the linker to do an unreasonable amount of x86 code examination > in order to determine mov vs call for relaxation, we need two different > relocations (preferably using the same assembler mnemonic, and thus the > correct > relocation is enforced by the assembler). > > On the users/hjl/relax branch (and posted on list somewhere), the new > relocation is called R_X86_64_RELAX_GOTPCREL. I'm not keen on that "relax" > name, despite that being exactly what it's for. > > I suggest R_X86_64_GOTPLTPCREL_{CALL,LOAD} for the two relocation names. That > is, the address is in the .got.plt section, it's a pc-relative relocation, and > it's being used by a call or load (mov) insn. Since it is used for indirect call, how about R_X86_64_INBR_GOTPCREL? I updated users/hjl/relax branch to covert relocation in *foo@GOTPCREL(%rip) from R_X86_64_GOTPCREL to R_X86_64_RELAX_GOTPCREL so that existing assembly code works automatically with a new binutils. > With those two, we can fairly easily relax call/jmp to direct branches, and > mov > to lea. Yes, LTO can perform the same optimization, but I'll also agree that > there are many projects for which LTO is both overkill and unworkable. > > This does leave open other optimization questions, mostly around weak > functions. Consider constructs like > > if (foo) foo(); > > Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the > possibility (not certainty) of jump-to-jump but definitely avoiding a separate > load insn and the latency implied by that? > > > Comments? > > > r~ -- H.J.