https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551

--- Comment #20 from Matthias Kretz <kretz at kde dot org> ---
The original issue I meant to report is fixed. There are many more missed
optimizations in the original example, though.

I.e. https://godbolt.org/z/7P1o3O should compile to:
use_insert_extract():
  vmovdqu DATA+4(%rip), %xmm2
  vmovdqu DATA+20(%rip), %xmm4
  vpaddd DATA(%rip), %xmm2, %xmm0
  vpaddd DATA+16(%rip), %xmm4, %xmm1
  vpaddd %xmm2, %xmm0, %xmm0
  vpaddd %xmm4, %xmm1, %xmm1
  vmovups %xmm0, DATA(%rip)
  vmovups %xmm1, DATA+16(%rip)
  ret

Reply via email to