https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54939
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution|--- |FIXED CC| |crazylht at gmail dot com --- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- With SSE4.2 we now get .L3: movupd (%rdx,%rax), %xmm0 movupd (%rcx,%rax), %xmm4 movapd %xmm0, %xmm1 palignr $8, %xmm0, %xmm0 mulpd %xmm3, %xmm1 mulpd %xmm2, %xmm0 addpd %xmm4, %xmm1 addsubpd %xmm0, %xmm1 movups %xmm1, (%rcx,%rax) addq $16, %rax cmpq %rsi, %rax jne .L3 with AVX and FMA .L4: vmovupd (%rdx,%rax), %ymm0 vmovapd %ymm4, %ymm1 vfmadd213pd (%rcx,%rax), %ymm0, %ymm1 vpermilpd $5, %ymm0, %ymm0 vmulpd %ymm3, %ymm0, %ymm0 vaddsubpd %ymm0, %ymm1, %ymm1 vmovupd %ymm1, (%rcx,%rax) addq $32, %rax cmpq %rsi, %rax jne .L4 so I'd say fixed. But. With AVX512 we now get .L4: vmovupd (%rdi,%rax), %zmm0 vmovapd %zmm7, %zmm2 vmovapd %zmm4, %zmm6 vfmadd213pd (%rcx,%rax), %zmm0, %zmm2 vpermilpd $85, %zmm0, %zmm0 vfmadd132pd %zmm0, %zmm2, %zmm6 vfnmadd132pd %zmm4, %zmm2, %zmm0 vmovapd %zmm6, %zmm0{%k1} vmovupd %zmm0, (%rcx,%rax) addq $64, %rax cmpq %rax, %rsi jne .L4 it's odd that this only happens with -mprefer-vector-width=512 though. Do we possibly miss vec_{fm,}{addsub,subadd} for those? Looks like so. Tracking in PR110767. The vectorizer side is fixed.