Am Dienstag, den 18.12.2018, 17:42 +0100 schrieb Jakub Jelinek: > On Tue, Dec 18, 2018 at 04:33:48PM +0000, Uecker, Martin wrote: > > > Yes, something like this. If the trampolines are pre-allocated, this could > > > even avoid the need to clear the cache on archs where this is needed. > > > > And if we can make the trampolines be all the same (and it somehow derived > > from the IP where it has to look for the static chain), we could map the > > same page of pre-allocated trampolines and not use memory on platforms > > with virtual memory. > > Yeah, if it is e.g. a pair of executable page and data page right after it, > say for x86_64 page of: > pushq $0 > jmp .L1 > pushq $1 > jmp .L1 > ... > push $NNN > jmp .L1 > # Almost at the end of page > .L1: > decode the above pushed number > read + decrypt the data (both where to jump to and static chain) > set static chain reg to the static chain data > jmp *function pointer > it could just mmap both pages at once PROT_NONE, and then mmap one from the > file and fill in data in the other page. Or perhaps one executable and two > data pages, depending on the exact sizes of needed data vs. code.
What do you think about making the trampoline a single call instruction and have a large memory region which is the same page mapped many times? call trampoline_handler call trampoline_handler call trampoline_handler ... ... many identical read-only pages ... ... The trampoline handler would pop the instruction pointer and use this as an index into the real stack to read the static chain and function pointer. Creation of a trampoline would consist of storing static chain and function on the stack (with right alignment) and simply return the corresponding address in the shadow stack. Best, Martin