https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97400
Bug ID: 97400 Summary: [10/11 Regression] SVE: wrong code since r10-3906-g96eb7d7a64 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- Created attachment 49364 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49364&action=edit assembly generated at r10-3906 AArch64 GCC miscompiles the following testcase: int a[256]; int c, f; int *d[10]; int main(void) { for (; f < 256; f++) { a[f] = c = 9; for (; c >= 0; c--) { d[c] = 0; } } return a[255]; } with -O3 -march=armv8.2-a+sve since r10-3906-g96eb7d7a642085f651e9940f0ee75568d7c4441d7. The program should exit with status code 9 but instead exits with status code 0. The program produces the correct result at r10-3681-g3faf75d458529592007436a0972f44e14ebf46f6, but between these two revisions, GCC ICEs on this input, so the bad commit lies somewhere in between these. To reproduce the issue back to r10-3906, you need to add -fno-common to the command line (this became the default in GCC 10). Examining the broken assembly code, it appears that the scalar epilogue for the inner loop tramples backwards through d into the end of a: .L8: add x1, x4, 1032 // x1 = &d[0] sub w5, w0, #1 // w5 = -1 sub w13, w0, #2 sub w12, w0, #3 sub w11, w0, #4 sub w9, w0, #5 str xzr, [x1, w0, sxtw 3] sub w8, w0, #6 str xzr, [x1, w5, sxtw 3] // incorrectly sets a[255] = 0 [...] The layout of .bss here is: f (4 bytes) | padding (4 bytes) | a (1024 bytes) | d (80 bytes) | c (4 bytes) I've attached the broken assembly code generated by GCC at r10-3906-g96eb7d7a642085f651e9940f0ee75568d7c4441d7.