https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at redhat dot com, | |vmakarov at redhat dot com --- Comment #3 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> --- (In reply to Ramana Radhakrishnan from comment #2) > (In reply to Andrew Pinski from comment #1) > > >I have only tested on 32-bit ARM, > > > > Looks only to be an 32bit ARM issue. AARCH64 both LP64 and ILP32 does not > > have a stack size issue. > > For both of those we get: > > stp x29, x30, [sp, -128]! > > > > So only 128 bytes. > > Confirmed . > > GCC 5 uses about 136 bytes > GCC 6 uses 1160 bytes at O2 for the reduced testcase. A few differences noted : - AArch32 ends up inlining the memcpy - However -fno-builtin makes very little difference to the stack size used, so that's really not an issue here. - AArch64 does not inline the memcpy at all. It does appear though that the register allocator seems to manage much better on AArch64 than on AArch32 with the shape of the CFG. That is possibly one difference - the other difference which I've noticed between GCC 5 and GCC 6 is that dom2 / jump threading looks like it duplicates a lot more blocks in in GCC 6 compared to GCC 5 and I need to go back and look into that one. The problem appears to stem with --param fsm-maximum-phi-arguments=7 but that's just a diagnostic tool to see what the trigger point is. CCing law and vmakarov to see if they have any insights in this case.