Re: RFC: stack/heap collision vulnerability and mitigation with GCC

Jeff Law Mon, 19 Jun 2017 15:10:01 -0700

On 06/19/2017 12:02 PM, Richard Biener wrote:
> On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener 
> <[email protected]> wrote:
>> On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek <[email protected]>
>> wrote:
>>> On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote:
>>>> After much poking around I concluded that we really need to
>> implement
>>>> allocation and probing via a "moving sp" strategy.   Probing into
>>>> unallocated areas runs afoul of valgrind, so that's a non-starter.
>>>>
>>>> Allocating stack space, then probing the pages within the space is
>>>> vulnerable to async signal delivery between the allocation point and
>>> the
>>>> probe point.  If that occurs the signal handler could end up running
>>> on
>>>> a stack that has collided with the heap.
>>>>
>>>> Ideally we would allocate and probe a page as an atomic unit (which
>>> is
>>>> feasible on PPC).  Alternatively, due to ISA restrictions, allocate
>> a
>>>> page, then probe the page as distinct instructions.  The latter
>> still
>>>> has a race, but we'd have to take the async signal in a single
>>>> instruction window.
>>>
>>> And if the allocation is only a page at a time, the single insn race
>>> window
>>> can be mitigated in the kernel (probe (read-only is fine) the word at
>>> the
>>> stack when setting up a signal frame for async signal).
>>>
>>>> So, time to open the discussion to questions & comments.
>>>>
>>>> I've got patches I need to cleanup and post for comments that
>>> implement
>>>> this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
>>>> shape.  THere's an unhandled case for s390.  I've got evaluation
>>> still
>>>> to do on aarch64.
>>>
>>> In the patches Jeff is going to post, we have (at least for
>>> -fasynchronous-unwind-tables which is on by default on e.g. x86)
>>> precise unwind info even with the new stack check mode.
>>> ira.c currently has:
>>>     /* We need the frame pointer to catch stack overflow exceptions
>> if
>>>   the stack pointer is moving (as for the alloca case just above). 
>> */
>>>       || (STACK_CHECK_MOVING_SP
>>>           && flag_stack_check
>>>           && flag_exceptions
>>>           && cfun->can_throw_non_call_exceptions)
>>> For alloca we have a frame pointer for other reasons, the question is
>>> if we really need this hunk even if we provided proper unwind info
>>> even for the Ada -fstack-check mode.  Or, if we provide proper unwind
>>> info
>>> for -fasynchronous-unwind-tables, if the above could not be also
>>> && !flag_asynchronous_unwind_tables.  Eric, what exactly is the reason
>>> for the above, is it just lack of proper CFI notes, or something
>>> different?
>>>
>>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack,
>>> while it is shorter, is it actually faster or as slow as movq $0,
>>> (%rsp)
>>> or movl $0, (%esp) ?
>>
>> It at least has the chance of bypassing all of the store queue in CPUs
>> and thus cause no cacheline allocation or trigger prefetching.
>>
>> Not sure if any of that is done though.
>>
>> Performance counters might tell.
>>
>> Otherwise incrementing SP by 4095 and then pushing al would work as
>> well (and be similarly short as the or).
> 
> Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 
> would solve the signal atomicity as well. Might be larger and somewhat 
> interfere with CPUs stack engine.  Who knows...
Happy to rely on Honza or Uros for guidance on that.  Though we do have
to maintain proper stack alignment, right?


jeff

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

Reply via email to