Am Freitag, den 21.12.2018, 16:13 -0500 schrieb Hans-Peter Nilsson: > On Tue, 18 Dec 2018, Uecker, Martin wrote: > > Am Dienstag, den 18.12.2018, 17:29 +0100 schrieb Martin Uecker: > > > Am Dienstag, den 18.12.2018, 17:24 +0100 schrieb Jakub Jelinek: > > > > On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote: > > > > > Right. This is the classic example and highlights the ABI concerns. > > > > > If > > > > > we use the low bit to distinguish between a normal function pointer > > > > > and > > > > > a pointer to a descriptor and qsort doesn't know about it, then we > > > > > lose. > > > > > > > > > > One way around this is to make *all* function pointers be some kind of > > > > > descriptor and route all indirect calls through a resolver. THen you > > > > > > > > Either way, you are creating a new ABI for calling functions through > > > > function pointers. Because of how rarely GNU C nested functions are > > > > used > > > > these days, if we want to do anything I'd think it might be better to > > > > use > > > > trampolines, just don't place them on the stack, say have a mmaped page > > > > of > > > > trampolines perhaps with some pointer encryption to where they jump to, > > > > so > > > > it isn't a way to circumvent non-executable stack, and have some > > > > register > > > > and unregister function you'd call to get or release the trampoline. > > > > If more trampolines are needed than currently available, the library > > > > could > > > > just mmap another such page. A problem is how it should interact with > > > > longjmp or similar APIs, because then we could leak some trampolines (no > > > > "destructor" for the trampoline would be called. The leaking could be > > > > handled e.g. through remembering the thread and frame pointer for which > > > > it > > > > has been allocated and if you ask for a new trampoline with a frame > > > > pointer > > > > above the already allocated one, release those entries or reuse them, > > > > instead of allocating a new one. And somehow deal with thread exit. > > > > > > Yes, something like this. If the trampolines are pre-allocated, this could > > > even avoid the need to clear the cache on archs where this is needed. > > > > And if we can make the trampolines be all the same (and it somehow derived > > from the IP where it has to look for the static chain), we could map the > > same page of pre-allocated trampolines and not use memory on platforms > > with virtual memory. > > All fine with new ideas, but consider the case where the nested > functions are nested. All mentioned ideas seem to fail for the > case where a caller (generating a trampoline to be called later) > is re-entered, i.e. need to generate another trampoline. The > same location can't be re-used. You need a sort of stack.
Yes, you need to be able to create arbitrary number of trampolines. But this would work: One can use a second stack with pre-allocated readonly trampolines. Everytime you would now create a trampoline on the real stack you simply refer to an existing trampoline at the same location on the parallel stack. And if these trampolines are all identical, you only need a single real page which is mapped many times. Setting up a stack would be more complicated because you also need to setup this parallel stack. Maybe simulating this second stack with a global hash table which uses thread id and stack pointer of the real stack as index is better... Best, Martin