https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97473
Bug ID: 97473 Summary: Spilled function parameters not aligned properly on multiple non-x86 targets Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: nate at thatsmathematics dot com Target Milestone: --- Created attachment 49394 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49394&action=edit Test case Suppose we have a type V requiring alignment, such as with __attribute__((aligned(N))). In current versions of gcc, both 10.2 and recent trunk, it appears that local (auto) variables of this type are properly aligned on the stack, at least on all the targets I tested. However, on many targets other than x86, alignment is apparently not respected for function parameters of this type when their address is taken. The function parameter may actually be passed in a register, in which case when its address is taken, it must be spilled to the stack. But on the failing targets, the spilled copy is not sufficiently aligned, and so for instance, other functions which receive a pointer to this variable will find it does not have the alignment that it should. I'm not sure if this is a bug or a limitation, but it's quite counterintuitive, since function parameters generally can be treated like local variables for most other purposes. I couldn't find any mention of this in the documentation or past bug reports. This can be reproduced by a very short C example like the following: typedef int V __attribute__((aligned(64))); void g(V *); void f(V x) { g(&x); } The function g can get a pointer that is not aligned to 64 bytes. A more complete test case is attached, which I tested mainly on ARM and AArch64 with gcc 10.2 and also trunk. It seems to happen with or without optimization, so long as one prevents IPA of g. Inspection of the assembly shows gcc does not generate any code to align the objects beyond the stack alignment guaranteed by the ABI (8 bytes for ARM, 16 bytes for AArch64). It fails on (complete gcc -v output below): - aarch64-linux-gnu 10.2.0 and trunk from today - arm-linux-gnueabihf 10.2.0 and trunk from last week - alpha-linux-gnu 10.2.0 - sparc64-linux-gnu 10.2.0 - mips-linux-gnu 10.2.0 It succeeds on: - x86_64-linux-gnu 10.2.0, also with -m32 On x86_64-linux-gnu, gcc generates instructions to align the stack and place the spilled copy of x at an aligned address, and the testcase passes there. (Perhaps this was implemented to support AVX?) With -m32 it copies x from its original unaligned position on the stack into an aligned stack slot. As noted, auto variables of the same type do get proper alignment on all the platforms I tested, and so one can work around with `V tmp = x; g(&tmp);`. For what it's worth, clang on ARM and AArch64 does align the spilled copies. I was not sure which pass of the compiler is responsible for this so I just chose component "other". I didn't think "target" was appropriate as this affects many targets, though not all. This issue was brought to my attention by StackOverflow user Alf (thanks!), see https://stackoverflow.com/questions/64287587/memory-alignment-issues-with-gcc-vector-extension-and-arm-neon. Alf's original program was in C++ for ARM32 with NEON and the hard-float ABI, and involved mixing functions that passed vector types (like int32x4_t) either by value or by by reference. In this setting they can be passed by value in SIMD registers, but in memory they require 16-byte alignment. This was violated, resulting in bus errors at runtime. So there is "real life" code affected by this. I tried including full `gcc -v` output from all versions tested, but it seems to be triggering the bugzilla spam filter, so I'm omitting it. Hopefully it isn't needed, but let me know if it is.