I've finally built up enough courage to start getting my head around this...
I see one outstanding issue sitting on this patch version:
On Sat, Oct 28, 2017 at 05:08:54AM +0100, Jeff Law wrote:
> On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> > --param=stack-clash-protection-probe-interval=13
> > --param=stack-clash-protection-guard-size=12
> >
> > So if there is a good reason to continue with 2 separate values, we must
> > force probe interval <= guard size!
> The param code really isn't designed to enforce values that are
> inter-dependent. It has a min, max & default values. No more, no less.
> If you set up something inconsistent with the params, it's simply not
> going to work.
>
>
> >
> > Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
> > crashes due to the offsets used in the probes - we don't need large offsets
> > as we want to probe close to the bottom of the stack.
> Not a surprise. While I tried to handle larger intervals, I certainly
> didn't test them. Given the ISA I wouldn't expect an interval > 12 to
> be useful or necessarily even work correctly.
Understood - weird behaviour with weird params don't concern me.
> > Functions with a large stack emit like alloca a lot of code, here I used
> > --param=stack-clash-protection-probe-interval=15:
> >
> > int f1(int x)
> > {
> > char arr[128*1024];
> > return arr[x];
> > }
> >
> > f1:
> > mov x16, 64512
> > sub sp, sp, x16
> > .cfi_def_cfa_offset 64512
> > mov x16, -32768
> > add sp, sp, x16
> > .cfi_def_cfa_offset -1024
> > str xzr, [sp, 32760]
> > add sp, sp, x16
> > .cfi_def_cfa_offset -66560
> > str xzr, [sp, 32760]
> > sub sp, sp, #1024
> > .cfi_def_cfa_offset -65536
> > str xzr, [sp, 1016]
> > ldrb w0, [sp, w0, sxtw]
> > .cfi_def_cfa_offset 131072
> > add sp, sp, 131072
> > .cfi_def_cfa_offset 0
> > ret
> >
> > Note the cfa offsets are wrong.
> Yes. They definitely look wrong. There's a clear logic error in
> setting up the ADJUST_CFA note when the probing interval is larger than
> 2**12. That should be easily fixed. Let me poke at it.
This one does concern me, how did you get on? Did it respond well to
prodding?
> > There is an odd mix of a big initial adjustment, then some
> > probes+adjustments and
> > then a final adjustment and probe for the remainder. I can't see the point
> > of having
> > both an initial and remainder adjustment. I would expect this:
> >
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > ldrb w0, [sp, w0, sxtw]
> > add sp, sp, 131072
> > ret
> I'm really not able to justify spending further time optimizing the
> aarch64 implementation. I've done the best I can. You can take the
> work as-is or improve it, but I really can't justify further time
> investment on that architecture.
Makes sense. Understood. And certainly not required to land this patch.
> > int f2(int x)
> > {
> > char arr[128*1024];
> > return arr[x];
> > }
> >
> > f2:
> > mov x16, 64512
> > sub sp, sp, x16
> > mov x16, -65536
> > movk x16, 0xfffd, lsl 16
> > add x16, sp, x16
> > .LPSRL0:
> > sub sp, sp, 4096
> > str xzr, [sp, 4088]
> > cmp sp, x16
> > b.ne .LPSRL0
> > sub sp, sp, #1024
> > str xzr, [sp, 1016]
> > ldrb w0, [sp, w0, sxtw]
> > add sp, sp, 262144
> > ret
> >
> > The cfa entries are OK for this case. There is a mix of positive/negative
> > offsets which
> > makes things confusing. Again there are 3 kinds of adjustments when for
> > this size we
> > only need the loop.
> >
> > Reusing the existing gen_probe_stack_range code appears a bad idea since
> > it ignores the probe interval and just defaults to 4KB. I don't see why it
> > should be
> > any more complex than this:
> >
> > sub x16, sp, 262144 // only need temporary if > 1MB
> > .LPSRL0:
> > sub sp, sp, 65536
> > str xzr, [sp, 1024]
> > cmp sp, x16
> > b.ne .LPSRL0
> > ldrb w0, [sp, w0, sxtw]
> > add sp, sp, 262144
> > ret
> >
> > Probe insertion if final adjustment >= 1024 also generates a lot of
> > redundant
> > code - although this is more a theoretical issue given this is so rare.
> Again, if ARM wants this optimized, then ARM's engineers are going to
> have to take the lead here. I've invested all I can reasonably invest
> in terms of trying optimize the probing for this target.
Likewise here - thanks for your work so far, I have no expectation of this
being fully optimized before I OK it to land.
Sorry for the big delay getting round to this patch, I hope to get serious
time to put in to it later this week, and it would be helpful to close out
the few remaining issues before I do.
Thanks,
James