On 19/06/17 18:07, Jeff Law wrote:
> As some of you are likely aware, Qualys has just published fairly
> detailed information on using stack/heap clashes as an attack vector.
> Eric B, Michael M -- sorry I couldn't say more when I contact you about
> -fstack-check and some PPC specific stuff.  This has been under embargo
> for the last month.
> 
> 
> --
> 
> 
> http://www.openwall.com/lists/oss-security/2017/06/19/1
> 
[...]
> aarch64 is significantly worse.  There are no implicit probes we can
> exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
> So we have the track the distance to the most recent probe and when that
> distance grows too large, we have to emit a probe.  Of course we have to
> make worst case assumptions at function entry.
> 

I'm not sure I understand what you're saying here.  According to the
comment above aarch64_expand_prologue, the stack frame looks like:

+-------------------------------+
|                               |
|  incoming stack arguments     |
|                               |
+-------------------------------+
|                               | <-- incoming stack pointer (aligned)
|  callee-allocated save area   |
|  for register varargs         |
|                               |
+-------------------------------+
|  local variables              | <-- frame_pointer_rtx
|                               |
+-------------------------------+
|  padding0                     | \
+-------------------------------+  |
|  callee-saved registers       |  | frame.saved_regs_size
+-------------------------------+  |
|  LR'                          |  |
+-------------------------------+  |
|  FP'                          | / <- hard_frame_pointer_rtx (aligned)
+-------------------------------+
|  dynamic allocation           |
+-------------------------------+
|  padding                      |
+-------------------------------+
|  outgoing stack arguments     | <-- arg_pointer
|                               |
+-------------------------------+
|                               | <-- stack_pointer_rtx (aligned)

Now for the majority of frames the amount of local variables is small
and there is neither dynamic allocation nor the need for outgoing local
variables.  In this case the first instruction in the function is

        stp     fp, lr, [sp, #-FrameSize]!

So this instruction allocates all the stack needed and acts stores the
required registers.  That acts as an implicit probe as far as I can tell.


If the locals area gets slightly larger (>= 512 bytes) then the sequence
becomes
        sub     sp, sp, #FrameSize
        stp     fp, lr, [sp]

But again this acts as a sufficient implicit probe provided that
FrameSize does not exceed the probe interval.

Yes, we need more implicit probes if the local variable space becomes
large and we need additional probes for checking the outgoing area and
the dynamic area, but again, if those are small (< 512) we could replace
the existing
        sub     sp, sp, #n
with
        str     xzr, [sp, #-n]!

and thus the explicit probe now becomes the stack allocation operation.

Reply via email to