Ondřej Bílka wrote: > This could be done faster without hash table by making alloca result > aligned to 2 * S and malloc ones not aligned to 2 * S by adding some padding.
Nice trick. This can be done without violating the rules of how alloca() is used. A similar idea, that also consists in distinguishing the two cases by the address, is if we know the stack bounds. - For the current thread, glibc/nptl/allocatestack.c stores the stack bounds in thread->stackblock and thread->stackblock_size. Unfortunately we have no public accessor for it. It would be nice to have a pthread_getstackbound(pthread_t, stack_t*) function... - For the main thread, the stack can grow unbounded, and it's the kernel which decides when to stop its growth. GNU libsigsegv contains code to determine the current bounds, but it involves many system calls. => This approach is probably not viable. Another idea is to add some header (like the current implementation does), but instead of storing a marker only in the malloc case, store a different marker also in the alloca case. This should be done through a GCC statement expression. => Should work with __builtin_alloca. > It would make check on free simpler. For allocation its fastest with > __builtin_alloca_with_align(x, 2 * sa_alignment_max) Unfortunately we cannot use __builtin_alloca_with_align here, because the GCC documentation [1] says regarding __builtin_alloca_with_align: "The allocated storage is released no later than just before the calling function returns to its caller, but may be released at the end of the block in which the function was called." whereas what we need is the lifetime speified for __builtin_alloca: "The lifetime of the allocated object ends just before the calling function returns to its caller. This is so even when __builtin_alloca is called within a nested block." Any idea ideas (before I go on to rewrite the malloca module)? Bruno [1] https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Other-Builtins.html