Hi Jeff,

On 11 July 2017 at 23:19, Jeff Law <l...@redhat.com> wrote:
> This patch series is designed to mitigate the problems exposed by the
> stack-clash exploits.  As I've noted before, the way to address this
> class of problems is via a good stack probing strategy.
>
> This has taken much longer than expected to pull together for
> submission.  Sorry about that.  However, the delay has led to some clear
> improvements on ppc, aarch64 and s390 as well as tests which aren't
> eyeballed, but instead are part of the testsuite.
>
> This series introduces -fstack-check=clash which is a variant of
> -fstack-check designed to prevent "jumping the stack" as seen in the
> stack-clash exploits.
>
>
>
> The key ideas here:
>
> Individual stack allocations are never more than PROBE_INTERVAL in size
> (4k by default).  Larger allocations are broken up into PROBE_INTERVAL
> chunks and each chunk is probed as it is allocated.
>
> No combination of stack allocations can exceed PROBE_INTERVAL bytes
> without probing.  ie, if we have an allocation of 2k and a later
> allocation of 3k, then there must be a stack probe into the first 4k of
> allocated space that executes between the two allocations.
>
> We must consider an environment where code compiled without stack
> probing is linked statically or dynamically with code that is compiled
> with stack probing.  That is actually the most likely scenario for an
> indefinite period of time.  Thus we have to consider the possibility of
> a hostile caller in the call stack.
>
> We need not guarantee enough stack space to handle a signal if a probe
> hits the guard page.
>
> --
>
>
> Probes come in two forms.  They can be explicit or implicit.
>
> Explicit probes are emitted by prologue generation or dynamic stack
> allocation routines.  These are net new code and avoiding them when it
> is safe to do so helps reduce the overhead of stack probing.
>
> Implicit probes are "probes" that occur as a natural side effect of the
> existing code or guarantees provided by the ABI.  They are essentially
> free and may allow the compiler to avoid some explicit probes.
>
> Examples of implicit probes include
>
>   1. ISA which pushes the return address onto the stack in a call
>      instruction (x86)
>
>   2. ABI mandates that *sp always contain a backchain pointer (ppc)
>
>   3. Prologue stores a register into the stack.  We exploit this on
>      aarch64 and s390.  On s390 register saves go into the caller's
>      stack frame, on aarch64 register saves hit newly allocated
>      space in the callee's frame.  We can exploit both to avoid
>      some explicit probing.
>
> I've done implementations for x86, ppc, aarch64 and s390 and the
> included tests have been checked against those targets
> ($arch-unknown-linux).
>
> This patch does not change the probing insn itself.  We've had various
> discussions on-list on a better probe insn for x86.  I think the
> consensus is to avoid read-modify-write insns.  A testb may ultimately
> be best.  This is IMHO an independent implementation detail for each
> target and should be handled as a follow-up.  But if folks insist, it's
> a trivial change to make as it doesn't fundamentally affect how all this
> stuff works.
>
> Other targets that have an existing -fstack-check=specific, but for
> which I have not added a -fstack-check=clash implementation get partial
> protection against stack clash as well.  This is a side effect of
> keeping some of the early code we'd hoped to use to avoid writing a new
> probe implementation for each target.
>
> --
>
> To get a sense of overhead, just 1.5% of routines in glibc need probing
> in their prologues (x86) in the testing I performed.  IIRC each and
> every one of those routines needed just 1-4 inlined probes.
>
> Significantly more functions need alloca space probed (IIRC ~5%), but
> given the amazingly inefficient alloca code, I can't believe anyone will
> ever notice the probing overhead.
>
> --
>
>
> Patch #1 contains the new option -fstack-check=clash and some dejagnu
> infrastructure  (most of which is unused until later patches)
>
> Patch #2 adds the new style probing support to the alloca/vla area and
> indirects uses of STACK_CHECK_PROTECT through get_stack_check_protect.
>
> Patch #3 Add some generic dumping support for use by the target prologue
> expanders
>
> Patch #4 introduces the x86 specific bits
>
> Patch #5 addresses combine-stack-adjustments interactions with
> -fstack-check=clash
>
> Patch #6 adds PPC support
>
> Patch #7 adds aarch64 support
>
> Patch #8 adds s390 support
>
> The patch series has been bootstrapped and regression tested on
> x86_64-linux-gnu
> ppc64-linux-gnu
> ppc64le-linux-gnu
> aarch64-linux-gnu
> s390x-linux-gnu (another respin of this is still in-progress)
>
> Additionally, each target has been bootstrapped with -fstack-check=clash
> enabled by default, the testsuite run and checked for glaring errors.
>
> Earlier versions have also bootstrapped on 32bit PPC and 32bit s390.
>
> Earlier versions have also been used to build and regression test.
> glibc-2.17 with -fstack-check=clash enabled by default.  The resulting
> x86 and x86_64 libraries also were scanned to verify proper probing.
> Similarly for x86_64 builds with the trunk glibc.
>
>
> An earlier version was also used to build a particular application (that
> shall remain nameless) that had the most large frames in RHEL 7.  When
> -fstack-check=clash was enabled the scanner was able to verify all the
> large frames had been broken down in PROBE_INTERVAL chunks with proper
> probing on x86_64.
>
> There are some dependencies between patches.  Prior versions were
> bootstrapped and regression tested at each step.  I have not gone
> through that rigor after the various rebases and tweaks over the few
> weeks.  If partial patches are approved for installing on the trunk I
> will verify they work across the set of targets noted above before
> installing.
>
> Let the comments, questions, flames and suggestions fly :-)
>

I have executed a validation of your patch series on aarch64 and arm
targets, and I have minor comments.

On arm, all new tests are unsupported, as expected.
On aarch64-linux, the new tests pass, but they fail on aarch64-elf:
  - FAIL appears              [     => FAIL]:

  Executed from: gcc.dg/dg.exp
    gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash frame pointer needed" 2
    gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash inline probe" 1
    gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash not noreturn" 2
    gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash residual allocation in prologue" 2
    gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and
probing in loop" 5
    gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and
probing residuals" 6
    gcc.dg/stack-check-3.c scan-rtl-dump-times expand "skipped dynamic
allocation and probing loop" 1
    gcc.dg/stack-check-4.c scan-rtl-dump-times pro_and_epilogue "Stack
clash noreturn" 1
    gcc.dg/stack-check-4.c scan-rtl-dump-times pro_and_epilogue "Stack
clash not noreturn" 1
    gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack
clash frame pointer needed" 2
    gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack
clash inline probes " 2
    gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack
clash no frame pointer needed" 2
    gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack
clash no probe" 2
    gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack
clash frame pointer needed" 2
    gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack
clash inline probes" 2
    gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack
clash no frame pointer needed" 2
    gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack
clash probe loop" 2
    gcc.dg/stack-check-9.c (test for excess errors)
    gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack
clash frame pointer needed" 1
    gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack
clash inline probes" 1
    gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack
clash not noreturn" 1
    gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack
clash residual allocation in prologue" 1


As I noticed that you used dg-require-effective-target
stack_clash_protected instead of
dg-require-stack-check "clash" that I recently committed, I also tried
with the later.

And in this case, since -fstack-check=clash is accepted on arm, the
new testcases have
failures on arm like:
PASS: gcc.dg/stack-check-10.c (test for excess errors)
FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash inline probe" 1
FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash no probe" 1
FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash not noreturn" 2
FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash residual allocation in prologue" 2
FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue
"Stack clash no frame pointer needed" 2

Christophe

>
> Jeff

Reply via email to