https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103100
Bug ID: 103100
Summary: unaligned access generated when zero-initializing
large locals with SIMD-instructions and -O2
-mstrict-align
Product: gcc
Version: 11.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: felix at breitweiser dot de
Target Milestone: ---
Created attachment 51738
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51738&action=edit
source code that generates the faulty assembly
when zero-intializing large local variables, gcc 11.2 (with -O2 and -O3) uses
SIMD registers to store a pair of 16-byte registers at once into memory. When
doing so, gcc can generate code that does not access memory on a 16-byte
aligned boundary, even though the aarch64 architecture requires memory accesses
to be 16-byte aligned when using the full 16-byte SIMD registers. This happens
with -mstrict-align enabled.
For example:
static void (*use)(unsigned char*); // to suppress optimizations
extern "C" void _start() {
unsigned char t2[216]={};
use(t2);
}
when compiled with "gcc -save-temps -O2 -mstrict-align" generates the following
assembly:
_start:
stp x29, x30, [sp, #-240]!// assuming sp is aligned to 16-bytes
here
mov x1, #0x0
moviv0.4s, #0x0
add x2, sp, #0x28 // the value in x2 is 8-byte aligned, but not
16-byte aligned
mov x29, sp
stp xzr, xzr, [sp, #24]
add x0, sp, #0x18
stp q0, q0, [x2] // x2 is not 16-byte aligned, so the store is not
aligned
add x2, sp, #0x48
str xzr, [sp, #232]
stp q0, q0, [x2]
add x2, sp, #0x68
stp q0, q0, [x2]
add x2, sp, #0x88
stp q0, q0, [x2]
add x2, sp, #0xa8
stp q0, q0, [x2]
add x2, sp, #0xc8
stp q0, q0, [x2]
blr x1
ldp x29, x30, [sp], #240
ret
I have seen https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71727 and even though
that is marked as fixed, this issue persists in gcc 11.2