https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2024-02-09

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>a redundant scalar load 

I don't see any redundant load in that loop.


```
L3:
        movq    (%rdi), %rax   ;; load a[i] from rdi
        vmovups (%rax), %xmm1  ;; load rax[0-3] into vector
        vdivps  %xmm0, %xmm1, %xmm1 ;; divide
        vmovups %xmm1, (%rax)  ;; store result back into rax[0-3]
        addq    $16, %rax   ;; add 4*4 to rax
        movq    %rax, (%rdi) ;; store rax back into rdi
        addq    $8, %rdi     ;; add 8 to rdi
        cmpq    %rdi, %rdx
        jne     .L3          ;; compare and loop back
```

That is a[i] is different between each iterations.

Maybe you reduced this code too much?

Reply via email to