On Fri, May 8, 2020 at 7:22 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > Attached WIP patch enables auto-vectorization of basic V2SF operations > (plus, minus, mult, min/max). The compiler takes care that everything > is loaded from memory via movq insn, so top two registers always > remain zero.
This example: --cut here-- float r[2], a[2], b[2], c[2]; void foo (void) { for (int i = 0; i < 2; i++) r[i] = 0.0f + a[i] - b[i] * c[i] + -1.0f; } --cut here-- compiles (-O3) to: foo: movq a(%rip), %xmm0 xorps %xmm1, %xmm1 movq c(%rip), %xmm2 addps %xmm1, %xmm0 movq b(%rip), %xmm1 mulps %xmm2, %xmm1 subps %xmm1, %xmm0 movq .LC0(%rip), %xmm1 addps %xmm1, %xmm0 movlps %xmm0, r(%rip) ret Uros.