Jeff Law wrote: > But the stack pointer might have already been advanced into the guard > page by the caller. For the sake of argument assume the guard page is > 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that > the caller hasn't touched the 0xf1000 page. > > If FrameSize >= 32, then the stores are going to hit the 0xf0000 page > rather than the 0xf1000 page. That's jumping the guard. Thus we have > to emit a probe prior to this stack allocation.
That's an incorrect ABI that allows adjusting the frame by 4080+32! A correct one might allow say 1024 bytes for outgoing arguments. That means when you call a function, there is still guard-page-size - 1024 bytes left that you can use to allocate locals. With a 4K guard page that allows leaf functions up to 3KB, and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing arguments without inserting any probes beyond the normal frame stores. This design means almost no functions need additional probes. Assuming we're also increasing the guard page size to 64KB, it's cheap even for large functions. Wilco