Hi Jeff,
On 11 July 2017 at 23:19, Jeff Law <l...@redhat.com> wrote: > This patch series is designed to mitigate the problems exposed by the > stack-clash exploits. As I've noted before, the way to address this > class of problems is via a good stack probing strategy. > > This has taken much longer than expected to pull together for > submission. Sorry about that. However, the delay has led to some clear > improvements on ppc, aarch64 and s390 as well as tests which aren't > eyeballed, but instead are part of the testsuite. > > This series introduces -fstack-check=clash which is a variant of > -fstack-check designed to prevent "jumping the stack" as seen in the > stack-clash exploits. > > > > The key ideas here: > > Individual stack allocations are never more than PROBE_INTERVAL in size > (4k by default). Larger allocations are broken up into PROBE_INTERVAL > chunks and each chunk is probed as it is allocated. > > No combination of stack allocations can exceed PROBE_INTERVAL bytes > without probing. ie, if we have an allocation of 2k and a later > allocation of 3k, then there must be a stack probe into the first 4k of > allocated space that executes between the two allocations. > > We must consider an environment where code compiled without stack > probing is linked statically or dynamically with code that is compiled > with stack probing. That is actually the most likely scenario for an > indefinite period of time. Thus we have to consider the possibility of > a hostile caller in the call stack. > > We need not guarantee enough stack space to handle a signal if a probe > hits the guard page. > > -- > > > Probes come in two forms. They can be explicit or implicit. > > Explicit probes are emitted by prologue generation or dynamic stack > allocation routines. These are net new code and avoiding them when it > is safe to do so helps reduce the overhead of stack probing. > > Implicit probes are "probes" that occur as a natural side effect of the > existing code or guarantees provided by the ABI. They are essentially > free and may allow the compiler to avoid some explicit probes. > > Examples of implicit probes include > > 1. ISA which pushes the return address onto the stack in a call > instruction (x86) > > 2. ABI mandates that *sp always contain a backchain pointer (ppc) > > 3. Prologue stores a register into the stack. We exploit this on > aarch64 and s390. On s390 register saves go into the caller's > stack frame, on aarch64 register saves hit newly allocated > space in the callee's frame. We can exploit both to avoid > some explicit probing. > > I've done implementations for x86, ppc, aarch64 and s390 and the > included tests have been checked against those targets > ($arch-unknown-linux). > > This patch does not change the probing insn itself. We've had various > discussions on-list on a better probe insn for x86. I think the > consensus is to avoid read-modify-write insns. A testb may ultimately > be best. This is IMHO an independent implementation detail for each > target and should be handled as a follow-up. But if folks insist, it's > a trivial change to make as it doesn't fundamentally affect how all this > stuff works. > > Other targets that have an existing -fstack-check=specific, but for > which I have not added a -fstack-check=clash implementation get partial > protection against stack clash as well. This is a side effect of > keeping some of the early code we'd hoped to use to avoid writing a new > probe implementation for each target. > > -- > > To get a sense of overhead, just 1.5% of routines in glibc need probing > in their prologues (x86) in the testing I performed. IIRC each and > every one of those routines needed just 1-4 inlined probes. > > Significantly more functions need alloca space probed (IIRC ~5%), but > given the amazingly inefficient alloca code, I can't believe anyone will > ever notice the probing overhead. > > -- > > > Patch #1 contains the new option -fstack-check=clash and some dejagnu > infrastructure (most of which is unused until later patches) > > Patch #2 adds the new style probing support to the alloca/vla area and > indirects uses of STACK_CHECK_PROTECT through get_stack_check_protect. > > Patch #3 Add some generic dumping support for use by the target prologue > expanders > > Patch #4 introduces the x86 specific bits > > Patch #5 addresses combine-stack-adjustments interactions with > -fstack-check=clash > > Patch #6 adds PPC support > > Patch #7 adds aarch64 support > > Patch #8 adds s390 support > > The patch series has been bootstrapped and regression tested on > x86_64-linux-gnu > ppc64-linux-gnu > ppc64le-linux-gnu > aarch64-linux-gnu > s390x-linux-gnu (another respin of this is still in-progress) > > Additionally, each target has been bootstrapped with -fstack-check=clash > enabled by default, the testsuite run and checked for glaring errors. > > Earlier versions have also bootstrapped on 32bit PPC and 32bit s390. > > Earlier versions have also been used to build and regression test. > glibc-2.17 with -fstack-check=clash enabled by default. The resulting > x86 and x86_64 libraries also were scanned to verify proper probing. > Similarly for x86_64 builds with the trunk glibc. > > > An earlier version was also used to build a particular application (that > shall remain nameless) that had the most large frames in RHEL 7. When > -fstack-check=clash was enabled the scanner was able to verify all the > large frames had been broken down in PROBE_INTERVAL chunks with proper > probing on x86_64. > > There are some dependencies between patches. Prior versions were > bootstrapped and regression tested at each step. I have not gone > through that rigor after the various rebases and tweaks over the few > weeks. If partial patches are approved for installing on the trunk I > will verify they work across the set of targets noted above before > installing. > > Let the comments, questions, flames and suggestions fly :-) > I have executed a validation of your patch series on aarch64 and arm targets, and I have minor comments. On arm, all new tests are unsupported, as expected. On aarch64-linux, the new tests pass, but they fail on aarch64-elf: - FAIL appears [ => FAIL]: Executed from: gcc.dg/dg.exp gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash frame pointer needed" 2 gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash inline probe" 1 gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash not noreturn" 2 gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash residual allocation in prologue" 2 gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing in loop" 5 gcc.dg/stack-check-3.c scan-rtl-dump-times expand "allocation and probing residuals" 6 gcc.dg/stack-check-3.c scan-rtl-dump-times expand "skipped dynamic allocation and probing loop" 1 gcc.dg/stack-check-4.c scan-rtl-dump-times pro_and_epilogue "Stack clash noreturn" 1 gcc.dg/stack-check-4.c scan-rtl-dump-times pro_and_epilogue "Stack clash not noreturn" 1 gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack clash frame pointer needed" 2 gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack clash inline probes " 2 gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack clash no frame pointer needed" 2 gcc.dg/stack-check-5.c scan-rtl-dump-times pro_and_epilogue "Stack clash no probe" 2 gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack clash frame pointer needed" 2 gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack clash inline probes" 2 gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack clash no frame pointer needed" 2 gcc.dg/stack-check-6.c scan-rtl-dump-times pro_and_epilogue "Stack clash probe loop" 2 gcc.dg/stack-check-9.c (test for excess errors) gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack clash frame pointer needed" 1 gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack clash inline probes" 1 gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack clash not noreturn" 1 gcc.dg/stack-check-9.c scan-rtl-dump-times pro_and_epilogue "Stack clash residual allocation in prologue" 1 As I noticed that you used dg-require-effective-target stack_clash_protected instead of dg-require-stack-check "clash" that I recently committed, I also tried with the later. And in this case, since -fstack-check=clash is accepted on arm, the new testcases have failures on arm like: PASS: gcc.dg/stack-check-10.c (test for excess errors) FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash inline probe" 1 FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash no probe" 1 FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash not noreturn" 2 FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash residual allocation in prologue" 2 FAIL: gcc.dg/stack-check-10.c scan-rtl-dump-times pro_and_epilogue "Stack clash no frame pointer needed" 2 Christophe > > Jeff