On 06/19/2017 12:02 PM, Richard Biener wrote: > On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener > <richard.guent...@gmail.com> wrote: >> On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek <ja...@redhat.com> >> wrote: >>> On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote: >>>> After much poking around I concluded that we really need to >> implement >>>> allocation and probing via a "moving sp" strategy. Probing into >>>> unallocated areas runs afoul of valgrind, so that's a non-starter. >>>> >>>> Allocating stack space, then probing the pages within the space is >>>> vulnerable to async signal delivery between the allocation point and >>> the >>>> probe point. If that occurs the signal handler could end up running >>> on >>>> a stack that has collided with the heap. >>>> >>>> Ideally we would allocate and probe a page as an atomic unit (which >>> is >>>> feasible on PPC). Alternatively, due to ISA restrictions, allocate >> a >>>> page, then probe the page as distinct instructions. The latter >> still >>>> has a race, but we'd have to take the async signal in a single >>>> instruction window. >>> >>> And if the allocation is only a page at a time, the single insn race >>> window >>> can be mitigated in the kernel (probe (read-only is fine) the word at >>> the >>> stack when setting up a signal frame for async signal). >>> >>>> So, time to open the discussion to questions & comments. >>>> >>>> I've got patches I need to cleanup and post for comments that >>> implement >>>> this for x86, ppc, aarch64 and s390. x86 and ppc are IMHO in good >>>> shape. THere's an unhandled case for s390. I've got evaluation >>> still >>>> to do on aarch64. >>> >>> In the patches Jeff is going to post, we have (at least for >>> -fasynchronous-unwind-tables which is on by default on e.g. x86) >>> precise unwind info even with the new stack check mode. >>> ira.c currently has: >>> /* We need the frame pointer to catch stack overflow exceptions >> if >>> the stack pointer is moving (as for the alloca case just above). >> */ >>> || (STACK_CHECK_MOVING_SP >>> && flag_stack_check >>> && flag_exceptions >>> && cfun->can_throw_non_call_exceptions) >>> For alloca we have a frame pointer for other reasons, the question is >>> if we really need this hunk even if we provided proper unwind info >>> even for the Ada -fstack-check mode. Or, if we provide proper unwind >>> info >>> for -fasynchronous-unwind-tables, if the above could not be also >>> && !flag_asynchronous_unwind_tables. Eric, what exactly is the reason >>> for the above, is it just lack of proper CFI notes, or something >>> different? >>> >>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack, >>> while it is shorter, is it actually faster or as slow as movq $0, >>> (%rsp) >>> or movl $0, (%esp) ? >> >> It at least has the chance of bypassing all of the store queue in CPUs >> and thus cause no cacheline allocation or trigger prefetching. >> >> Not sure if any of that is done though. >> >> Performance counters might tell. >> >> Otherwise incrementing SP by 4095 and then pushing al would work as >> well (and be similarly short as the or). > > Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 > would solve the signal atomicity as well. Might be larger and somewhat > interfere with CPUs stack engine. Who knows... Happy to rely on Honza or Uros for guidance on that. Though we do have to maintain proper stack alignment, right?
jeff