On Thu, Mar 15, 2018 at 1:41 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> On Sun, Mar 11, 2018 at 7:40 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> > On Mon, Mar 5, 2018 at 4:20 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> >> On Tue, Feb 27, 2018 at 11:39 AM, H.J. Lu <hongjiu...@intel.com> wrote: >> >>> For x86 targets, when -fno-plt is used, external functions are called >> >>> via GOT slot, in 64-bit mode: >> >>> >> >>> [bnd] call/jmp *foo@GOTPCREL(%rip) >> >>> >> >>> and in 32-bit mode: >> >>> >> >>> [bnd] call/jmp *foo@GOT[(%reg)] >> >>> >> >>> With -mindirect-branch=, they are converted to, in 64-bit mode: >> >>> >> >>> pushq foo@GOTPCREL(%rip) >> >>> [bnd] jmp __x86_indirect_thunk[_bnd] >> >>> >> >>> and in 32-bit mode: >> >>> >> >>> pushl foo@GOT[(%reg)] >> >>> [bnd] jmp __x86_indirect_thunk[_bnd] >> >>> >> >>> which were incompatible with CFI. In 64-bit mode, since R11 is a scratch >> >>> register, we generate: >> >>> >> >>> movq foo@GOTPCREL(%rip), %r11 >> >>> [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11 >> >>> >> >>> instead. We do it in ix86_output_indirect_branch so that we can use >> >>> the newly proposed R_X86_64_THUNK_GOTPCRELX relocation: >> >>> >> >>> https://groups.google.com/forum/#!topic/x86-64-abi/eED5lzn3_Mg >> >>> >> >>> movq foo@OTPCREL_THUNK(%rip), %r11 >> >>> [bnd] call/jmp __x86_indirect_thunk_[bnd_]r11 >> >>> >> >>> to load GOT slot into R11. If foo is defined locally, linker can can >> >>> convert >> >>> >> >>> movq foo@GOTPCREL_THUNK(%rip), %reg >> >>> call/jmp __x86_indirect_thunk_reg >> >>> >> >>> to >> >>> >> >>> call/jmp foo >> >>> nop 0L(%rax) >> >>> >> >>> In 32-bit mode, since all caller-saved registers, EAX, EDX and ECX, may >> >>> used to function parameters, there is no scratch register available. For >> >>> -fno-plt -fno-pic -mindirect-branch=, we expand external function call >> >>> to: >> >>> >> >>> movl foo@GOT, %reg >> >>> [bnd] call/jmp *%reg >> >>> >> >>> so that it can be converted to >> >>> >> >>> movl foo@GOT, %reg >> >>> [bnd] call/jmp __x86_indirect_thunk_[bnd_]reg >> >>> >> >>> in ix86_output_indirect_branch. Since this is performed during RTL >> >>> expansion, other instructions may be inserted between movl and call/jmp. >> >>> Linker optimization isn't always possible. > > I suppose we can just combine those into patterns if we want to prevent gcc > from
I will look into it. > interleaving this with other instructions. However since this affects ABI and > not only return thunk, did you discuss the changes with LLVM folks as well? This doesn't change calling convention. The new R_X86_64_THUNK_GOTPCRELX relocation is an optimization. It can be safely treated as R_X86_64_GOTPCRELX. > I would be nice to not have diverging solutions. > That is why I posted the new relocation to x86-64 psABI group. -- H.J.