On 07/13/2017 05:37 PM, Segher Boessenkool wrote: > On Thu, Jul 13, 2017 at 04:38:00PM -0600, Jeff Law wrote: >> On 07/13/2017 03:32 PM, Segher Boessenkool wrote: >>>>> -fstack-check=clash is itself not such a super name either. It's not >>>>> checking stack, and it isn't clashing: it just does a store to every >>>>> page the stack will touch (minus the first and last or something). >>>> Yea. I don't particularly like it either. Suggestions? I considered >>>> "probe" as well, but "specific" also does probing. In the end I used >>>> the part of the marketing name of the exploits. >>> >>> I don't think it should be inside -fstack-check at all. Sure, the >>> mechanisms implementing it overlap a bit (more on some targets, less >>> on others), but how will a user ask for clash protection _and_ for >>> stack checking? >> The biggest problem with separating them is we would end up with a fair >> amount code that ultimately looks like >> >> if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK >> || flag_whatever_the_new_thing_is) >> { >> probe the stack >> } > > But only in backends, right? And some backends will be actually > simpler, afaics. I suspect we might end up with one in explow.c as well. But yes, it'd primarily be buried in the targets.
>> Yup. It gets baked in even faster than normal in this case I suspect. >> Red Hat will almost certainly pick up the bits and start backporting >> them to RHEL 7, RHEL 6 and RHEL 5. So the flag will be out in the wild >> before gcc-8 hits the streets. > > Are you planning to backport it for mainline GCC as well? The hope is we reach a consensus for mainline GCC, then we backport whatever goes into mainline to the relevant RHEL releases. THe backports into RHEL would likely happen before gcc-8 hits the streets. I've got a little time before I have to invert that process and go into RHEL first. I *really* want to avoid that for a multitude of reasons. > >> There's -fstack-check and -fstack-clash-protection. I think with the >> direction we're going they are fundamentally incompatible because >> neither the compiler nor kernel do anything to guarantee enough stack is >> available after hitting the guard for -fstack-clash-protection. > > Hrm, I have to think about that. Even if the kernel implements the reserved page stuff mentioned earlier in the thread, I'm not sure when we'd be able to reliably depend on that capability (or even check for it). ISTM that -fstack-check continues to be Ada centric and we have a new option to deal with stack-clash protection and the two simply just aren't allowed to be enabled together. > >> And there's shrink wrapping. This was originally raised by the aarch64 >> guys and deserves some thought. I'm particularly worried about aarch64 >> if it was to shrink-wrap some particular register saves that we wanted >> to use as an implicit probe. > > For normal shrink-wrapping, the volatile reg stores will always happen > in the same prologue that sets up the stack frame as well, so there > won't be any problem (except the usual stack ties you might need in the > prologue, etc.) (*) For separate shrink-wrapping, yeah you'll need any > bb that could potentially store to the stack require the component for > the register you use as implicit probe. This is pretty nasty. I'm going to try and look at this tomorrow to get a feel for the aarch64 implementation separate shrink wrapping. For ppc I'm not too worried -- the implicit backchain probes are so effective at eliminating explicit probes that I didn't bother writing any code to track the register saves as implicit probes. I'm pretty sure it just works on ppc with separate shrink wrapping. > >>>> We created that alloca'd object at the wrong lexical scope which mucked >>>> up its expected persistence. I'm sure I'd spot it trivially once I set >>>> up the test again. >>> >>> That sounds like it might be a bug even. >> That was my thought as well. It's c-torture/execute/20071029-1.c. > > Ah cool, thanks for digging it up. NP. As expected, it was trivial to slam in the initializer and rerun the testsuite. As soon as I saw the failure and looked at the source I knew I'd found it again. The other thing to remember about -fstack-check=generic is that once your frame gets big, it just gives up. That's why it takes large automatic objects and turns them into alloca'd objects -- to reduce the size of the prologue allocated frame. I looked at -fstack-check=generic just long enough to realize it wasn't really an option to cover s390. Then I promptly moved onto more useful pursuits. > (*) Related, I know you don't want to scan generated code to see if > all probes are in place, but it seems you pretty much have to if you > want to make sure no later pass deletes the probes. Well, something > for targets to worry about :-) Right. The dumping that's currently done just gives us a view into what the prologue code wanted to emit. Passes after prologue generation can get in the way -- combine-stack-adjustments comes immediately to mind. end-to-end testing absolutely requires scanning the final rtl dump or the assembly code or DSOs/executables. My plan was to fault that in as-needed. For c-s-a, we've introduced a new note that gets attached to the allocation step when protecting against stack-clash. If c-s-a sees that note it will refuse to combine the stack adjustment with any others. I wrote a quick and dirty x86 specific test for that. You'll find those bits in patch #5. FOr some architectures you can get a good sense of problematical sequences with objdump and some greps. It's far from automated and nowhere near something I'd submit into GCC :-) Jeff