https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63530
Bug ID: 63530 Summary: GCC generates incorrect aligned store on ARM after the loop is unrolled. Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: congh at google dot com Created attachment 33710 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33710&action=edit assembly When compile the code shown below using GCC 5.0 for ARM with the following options: -O2 -ftree-vectorize -march=armv7-a -mfpu=neon -funroll-loops --param=max-completely-peeled-insns=400 // The code: typedef struct { unsigned char map[256]; int i; } A, *AP; void* calloc(int, int); AP foo(int n) { AP b = calloc(1, sizeof(A)); int i; for (i = n; i < 256; i++) b->map[i] = i; return b; } A instruction vst1.64 {d0-d1}, [r2:64] is generated, which is an aligned store with 8 bytes alignment requirement. However this requirement cannot be satisfied as the loop is not peeled for alignment, and the start address on the array is unknown at compile time. I have attached the generated assembly code here.