https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-03-08 Ever confirmed|0 |1 Keywords| |missed-optimization Blocks| |53947 Summary|s243 benchmark of TSVC is |s243 benchmark of TSVC is |vectorized by clang and not |vectorized by clang and not |by gcc |by gcc, missed DSE Component|middle-end |tree-optimization Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Hmm, wonder why DSE didn't remove the first a[i] store. Ah, because DSE doesn't use data-ref analysis and thus cannot disambiguate the variable offset. Manually applying DSE produces .L4: vmovaps c(%rax), %ymm1 vaddps e(%rax), %ymm1, %ymm0 addq $32, %rax vmovups a-28(%rax), %ymm1 vmulps d-32(%rax), %ymm1, %ymm1 vmulps d-32(%rax), %ymm0, %ymm0 vaddps b-32(%rax), %ymm0, %ymm0 vmovaps %ymm0, b-32(%rax) vaddps %ymm0, %ymm1, %ymm0 vmovaps %ymm0, a-32(%rax) cmpq $127968, %rax jne .L4 manually DSEd loop: for (int nl = 0; nl < iterations; nl++) { for (int i = 0; i < LEN_1D-1; i++) { real_t tem = b[i] + c[i ] * d[i]; b[i] = tem + d[i ] * e[i]; a[i] = b[i] + a[i+1] * d[i]; } } Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations