https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551
--- Comment #20 from Matthias Kretz <kretz at kde dot org> --- The original issue I meant to report is fixed. There are many more missed optimizations in the original example, though. I.e. https://godbolt.org/z/7P1o3O should compile to: use_insert_extract(): vmovdqu DATA+4(%rip), %xmm2 vmovdqu DATA+20(%rip), %xmm4 vpaddd DATA(%rip), %xmm2, %xmm0 vpaddd DATA+16(%rip), %xmm4, %xmm1 vpaddd %xmm2, %xmm0, %xmm0 vpaddd %xmm4, %xmm1, %xmm1 vmovups %xmm0, DATA(%rip) vmovups %xmm1, DATA+16(%rip) ret