https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102139
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Target Milestone|--- |11.3 Ever confirmed|0 |1 Summary|-O3 miscompile due to |[11/12 Regression] -O3 |slp-vectorize on strict |miscompile due to |align target |slp-vectorize on strict | |align target Last reconfirmed| |2021-08-31 --- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- fre is fine here. The problem is SLP. Here is why take: typedef decltype(sizeof(0)) size_t; typedef unsigned short uint16_t; typedef unsigned uint32_t; void zero_two_uint16(uint16_t* ptr) { ptr[0] = 0; ptr[1] = 0; } #define vector __attribute__((vector_size(sizeof(uint16_t)*4))) void f(uint16_t *a) { vector uint16_t *b = (vector uint16_t *)a; *b = (vector uint16_t){}; } void g(uint16_t *a) { size_t t = (size_t)a; if ((t & 0x7)==0) { for(int i = 0;i < 8;i++) f((a + i*4)); } else { for(int i = 0;i < 16;i++) zero_two_uint16((a + i*2)); } } ---- CUT ---- Compile it on aarch64 with -O3 -mgeneral-regs-only -fno-tree-loop-distribute-patterns -mstrict-align -fno-tree-loop-vectorize and you will produce the same result for SLP. There is no PHI for a for FRE to merge even. And there is no alignment information on the pointers assignments either. As you can see by the dump (-fdump-tree-*-all): # PT = nonlocal null uint16_tD.1724 * a_15(D) = aD.1733;