https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94908
Bug ID: 94908
Summary: Failure to optimally optimize certain shuffle patterns
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: gabravier at gmail dot com
Target Milestone: ---
typedef float v4sf __attribute__((vector_size(16)));
v4sf g();
v4sf f(v4sf a, v4sf b)
{
return (v4sf){g()[1], a[1], a[2], a[3]};
}
With -O3, LLVM outputs this :
f(float __vector(4), float __vector(4)): # @f(float __vector(4), float
__vector(4))
sub rsp, 24
movaps xmmword ptr [rsp], xmm0 # 16-byte Spill
call g()
movaps xmm1, xmmword ptr [rsp] # 16-byte Reload
shufps xmm0, xmm1, 17 # xmm0 = xmm0[1,0],xmm1[1,0]
shufps xmm0, xmm1, 232 # xmm0 = xmm0[0,2],xmm1[2,3]
add rsp, 24
ret
GCC outputs this :
f(float __vector(4), float __vector(4)):
sub rsp, 24
movaps XMMWORD PTR [rsp], xmm0
call g()
movaps xmm1, XMMWORD PTR [rsp]
add rsp, 24
shufps xmm0, xmm0, 85
movaps xmm2, xmm1
shufps xmm2, xmm1, 85
movaps xmm3, xmm2
movaps xmm2, xmm1
unpckhps xmm2, xmm1
unpcklps xmm0, xmm3
shufps xmm1, xmm1, 255
unpcklps xmm2, xmm1
movlhps xmm0, xmm2
ret
This also seems to occurs on powerpc64le, so I haven't marked it as
target-specific.