I've read through the MPX spec once, but most of it is still not very clear to me. So please correct any misconceptions. (HJ, if you answer any or all of these questions in your usual style with just, "It's not a problem," I will find you and I will kill you. Explain!)
Will an MPX-using binary require an MPX-supporting dynamic linker to run correctly? * An old dynamic linker won't clobber %bndN directly, so that's not a problem. * Does having the bounds registers set have any effect on regular/legacy code, or only when bndc[lun] instructions are used? If it doesn't affect normal instructions, then I don't entirely understand why it would matter to clear %bnd* when entering or leaving legacy code. Is it solely for the case of legacy code returning a pointer value, so that the new code would expect the new ABI wherein %bnd0 has been set to correspond to the pointer returned in %rax? * What's the effect of entering the dynamic linker via "bnd jmp" (i.e. new MPX-using binary with new PLT, old dynamic linker)? The old dynamic linker will leave %bndN et al exactly as they are, until its first unadorned branching instruction implicitly clears them. So the only problem would be if the work _dl_runtime_{resolve,profile} does before its first branch/call were affected by the %bndN state. If there are indeed any problems with this scenario, then you need a plan to make new binaries require a new dynamic linker (and fail gracefully in the absence of one, and have packaging systems grok the dependency, etc.) In a related vein, what's the effect of entering some legacy code via "bnd jmp" (i.e. new binary using PLT call into legacy DSO)? * If the state of %bndN et al does not affect legacy code directly, then it's not a problem. The legacy code will eventually use an unadorned branch instruction, and that will implicitly clear %bnd*. (Even if it's a leaf function that's entirely branch-free, its return will count as such an unadorned branch instruction.) * If that's not the case, then a PLT entry that jumps to legacy code will need to clear the %bndN state. I see one straightforward approach, at the cost of a double-bounce (i.e. turning the normal double-bounce into a triple-bounce) when going from MPX code to legacy code. Each PLT entry can be: bnd jmp *foo@GOTPCREL(%rip) pushq $N bnd jmp .Lplt0 .balign 16 jmp *foo@GOTPCREL+8(%rip) .balign 32 and now each of those gets two (adjacent) GOT slots rather than just one. When the dynamic linker resolves "foo" and sees that it's in a legacy DSO, it sets the foo GOT slot to point to .plt+(N*32 + 16) and the foo+1 GOT slot to point to the real target (resolution of "foo"). After fixup, entering that PLT entry will do "bnd jmp" to the second half of the entry, which does (unadorned) "jmp" to the real target, implicitly clearing %bndN state. Those are the background questions to help me understand better. Now, to your specific questions. I can't tell if you are proposing that a single object might contain both 16-byte and 32-byte PLT slots next to each other in the same .plt section. That seems like a bad idea. I can think of two things off hand that expect PLT entries to be of uniform size, and there may well be more. * The foo@plt pseudo-symbols that e.g. objdump will display are based on the BFD backend knowing the size of PLT entries. Arguably this ought to look at sh_entsize of .plt instead of using baked-in knowledge, but it doesn't. * The linker-generated CFI for .plt is a single FDE for the whole section, using a DWARF expression covering all normal PLT entries together based on them having uniform size and contents. (You could of course make the linker generate per-entry CFI, or partition the PLT into short and long entries and have the CFI treat the two partitions appropriately differently. But that seems like a complication best avoided.) Now, assuming we are talking about a uniform PLT in each object, there is the question of whether to use a new PLT layout everywhere, or only when linking an object with some input files that use MPX. * My initial reaction was to say that we should just change it unconditionally to keep things simple: use new linker, get new format, end of story. Simplicity is good. * But, doubling the size of PLT entries means more i-cache pressure. If cache lines are 64 bytes, then today you fit four entries into a cache line. Assuming PLT entries are more used than unused, this is a good thing. Reducing that to two entries per cache line means twice as many i-cache misses if you hit a given PLT frequently (with even distribution of which entries you actually use--at any rate, it's "more" even if it's not "twice as many"). Perhaps this is enough cost in real-world situations to be worried about. I really don't know. * As I mentioned before, there are things floating around that think they know the size of PLT entries. Realistically, there will be plenty of people using new tools to build binaries but not using MPX at all, and these people will give those binaries to people who have old tools. In the case of someone running an old objdump on a new binary, they would see bogus foo@plt pseudo-symbols and be misled and confused. Not to mention the unknown unknowns, i.e. other things that "know" the size of PLT entries that we don't know about or haven't thought of here. It's just basic conservatism not to perturb things for these people who don't care about or need anything related to MPX at all. How a relocatable object is marked so that the linker knows whether its code is MPX-compatible at link time and how a DSO/executable is marked so that the dynamic linker knows at runtime are two separate subjects. For relocatable objects, I don't think there is really any precedent for using ELF notes to tell the linker things. It seems much nicer if the linker continues to treat notes completely normally, i.e. appending input files' same-named note sections together like with any other named section rather than magically recognizing and swallowing certain notes. OTOH, the SHT_GNU_ATTRIBUTES mechanism exists for exactly this sort of purpose and is used on other machines for very similar sorts of issues. There is both precedent and existing code in binutils to have the linker merge attribute sections from many input files together in a fashion aware of the semantics of those sections, and to have those attributes affect the linker's behavior in machine-specific ways. I think you have to make a very strong case to use anything other than SHT_GNU_ATTRIBUTES for this sort of purpose in relocatable objects. For linked objects, there a couple of obvious choices. They all require that the linker have special knowledge to create the markings. One option is a note. We use .note.ABI-tag for a similar purpose in libc, but I don't know of any precedent for the linker synthesizing notes. The most obvious choice is e_flags bits. That's what other machines use to mark ABI variants. There are no bits assigned for x86 yet. There are obvious limitations to using e_flags, in that it's part of the universal ELF psABI rather than something with vendor extensibility built in like notes have, and in that there are only 32 bits available to assign rather than being a wholly open-ended format like notes. But using e_flags is certainly simpler to synthesize in the linker and simpler to recognize in the dynamic linker than a note format. I think you have to make at least a reasonable (objective) case to use a note rather than e_flags, though I'm certainly not firmly against a note. Finally, you've only mentioned x86-64. The hardware details apply about the same to x86-32 AFAICT. If this is something that we'll eventually want to do for x86-32 as well, then I think we should at least hash out the plan for x86-32 fairly thoroughly before committing to a plan for x86-64 (even if the actual implementation for x86-32 lags). Probably it's all much the same and working it through for x86-32 won't give us any pause in our x86-64 plans, but we won't know until we actually do it. Thanks, Roland