https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2024-02-09
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>a redundant scalar load
I don't see any redundant load in that loop.
```
L3:
movq (%rdi), %rax ;; load a[i] from rdi
vmovups (%rax), %xmm1 ;; load rax[0-3] into vector
vdivps %xmm0, %xmm1, %xmm1 ;; divide
vmovups %xmm1, (%rax) ;; store result back into rax[0-3]
addq $16, %rax ;; add 4*4 to rax
movq %rax, (%rdi) ;; store rax back into rdi
addq $8, %rdi ;; add 8 to rdi
cmpq %rdi, %rdx
jne .L3 ;; compare and loop back
```
That is a[i] is different between each iterations.
Maybe you reduced this code too much?