https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-03-08
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
             Blocks|                            |53947
            Summary|s243 benchmark of TSVC is   |s243 benchmark of TSVC is
                   |vectorized by clang and not |vectorized by clang and not
                   |by gcc                      |by gcc, missed DSE
          Component|middle-end                  |tree-optimization
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, wonder why DSE didn't remove the first a[i] store.  Ah, because DSE
doesn't use data-ref analysis and thus cannot disambiguate the variable offset.

Manually applying DSE produces

.L4:
        vmovaps c(%rax), %ymm1
        vaddps  e(%rax), %ymm1, %ymm0
        addq    $32, %rax
        vmovups a-28(%rax), %ymm1
        vmulps  d-32(%rax), %ymm1, %ymm1
        vmulps  d-32(%rax), %ymm0, %ymm0
        vaddps  b-32(%rax), %ymm0, %ymm0
        vmovaps %ymm0, b-32(%rax)
        vaddps  %ymm0, %ymm1, %ymm0
        vmovaps %ymm0, a-32(%rax)
        cmpq    $127968, %rax
        jne     .L4


manually DSEd loop:

    for (int nl = 0; nl < iterations; nl++) {
        for (int i = 0; i < LEN_1D-1; i++) {
            real_t tem = b[i] + c[i  ] * d[i];
            b[i] = tem + d[i  ] * e[i];
            a[i] = b[i] + a[i+1] * d[i];
        }
    }


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to