There is a typo in pushq offset computation. It should be pushq_offset += ((unsigned char *) pushq_offset)[-6] == 0xf2 ? 1 : 0
instead of pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 H.J. ---- On Mon, Nov 18, 2013 at 11:03 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > Here is a proposal to use 32-byte PLT to preserve bound registers. > Any comments? > > BTW, we are working on another proposal to use a second PLT > section with 8 byte or 16 byte memory overhead, instead of > 24 byte overhead. > > -- > H.J. > --- > Intel MPX: > > http://software.intel.com/sites/default/files/319433-015.pdf > > introduces 4 bound registers, which will be used for parameter passing > in x86-64. Bound registers are cleared by branch instructions. Branch > instructions with BND prefix will keep bound register contents. This leads > to 2 requirements to 64-bit MPX run-time: > > 1. Dynamic linker (ld.so) should save and restore bound registers during > symbol lookup. > 2. Change the current 16-byte PLT0: > > ff 35 08 00 00 00 pushq GOT+8(%rip) > ff 25 00 10 00 jmpq *GOT+16(%rip) > 0f 1f 40 00 nopl 0x0(%rax) > > and 16-byte PLT1: > > ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip) > 68 00 00 00 00 pushq $index > e9 00 00 00 00 jmpq PLT0 > > which clear bound registers, to preserve bound registers. > > We use 2 new relocations: > > #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */ > #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */ > > to mark branch instructions with BND prefix. > > When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, > it switches to a different PLT0: > > ff 35 08 00 00 00 pushq GOT+8(%rip) > f2 ff 25 00 10 00 bnd jmpq *GOT+16(%rip) > 0f 1f 00 nopl (%rax) > > to preserve bound registers for symbol lookup. For a symbol with > R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use > a 32-byte PLT1: > > f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip) > 68 00 00 00 00 pushq $index > f2 e9 00 00 00 00 bnd jmpq PLT0 > 0f 1f 80 00 00 00 00 nopl 0(%rax) > 0f 1f 80 00 00 00 00 nopl 0(%rax) > > Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and > GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing > the corresponding the pushq offset with > > GOT[1] + (GOT offset - &GOT[3]) * 2 > > It depends on that each pushq is 16-byte apart and GOT entry is 8 byte. > To support prelink, each 16-byte block in PLT must have an 8-byte entry > in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1. > Then we can undo prelink by computing the corresponding the pushq offset > with > > pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2 > pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 > > For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND > relocations, this approach increases PLT size by 16 bytes and > GOT size by 8 bytes. That is 24 bytes in total. > > Pros: No additional sections are needed. > Cons: 24-byte memory overhead for each symbol with BND relocation. -- H.J.