https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89582
--- Comment #6 from Yichao Yu <yyc1992 at gmail dot com> --- For the vfloat test case, isn't the optimum code just ``` addps %xmm2, %xmm0 addps %xmm3, %xmm1 retq ``` It's not making full use of the vector but I assume not having to spill is a win? This is what clang produces. And for the LLVM early lowering of the calling convention, a less awkward way is. ``` define { <2 x float>, <2 x float> } @f2({<2 x float>, <2 x float>}, {<2 x float>, <2 x float>}) { %v0 = extractvalue { <2 x float>, <2 x float> } %0, 0 %v1 = extractvalue { <2 x float>, <2 x float> } %0, 1 %v2 = extractvalue { <2 x float>, <2 x float> } %1, 0 %v3 = extractvalue { <2 x float>, <2 x float> } %1, 1 %v5 = fadd <2 x float> %v0, %v2 %v6 = fadd <2 x float> %v1, %v3 %v7 = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> %v5, 0 %v8 = insertvalue { <2 x float>, <2 x float> } %v7, <2 x float> %v6, 1 ret { <2 x float>, <2 x float> } %v8 } ```