[Bug tree-optimization/89582] Suboptimal code generated for floating point struct in -O3 compare to -O2

yyc1992 at gmail dot com Thu, 04 Apr 2019 05:27:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89582


--- Comment #6 from Yichao Yu <yyc1992 at gmail dot com> ---
For the vfloat test case, isn't the optimum code just

```
        addps   %xmm2, %xmm0
        addps   %xmm3, %xmm1
        retq
```

It's not making full use of the vector but I assume not having to spill is a
win? This is what clang produces.

And for the LLVM early lowering of the calling convention, a less awkward way
is.

```
define { <2 x float>, <2 x float> } @f2({<2 x float>, <2 x float>}, {<2 x
float>, <2 x float>}) {
  %v0 = extractvalue { <2 x float>, <2 x float> } %0, 0
  %v1 = extractvalue { <2 x float>, <2 x float> } %0, 1
  %v2 = extractvalue { <2 x float>, <2 x float> } %1, 0
  %v3 = extractvalue { <2 x float>, <2 x float> } %1, 1
  %v5 = fadd <2 x float> %v0, %v2
  %v6 = fadd <2 x float> %v1, %v3
  %v7 = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> %v5, 0
  %v8 = insertvalue { <2 x float>, <2 x float> } %v7, <2 x float> %v6, 1
  ret { <2 x float>, <2 x float> } %v8
}
```

[Bug tree-optimization/89582] Suboptimal code generated for floating point struct in -O3 compare to -O2

Reply via email to