https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64477
Bug ID: 64477
Summary: x86 sse unnecessary GPR spill
Product: gcc
Version: 4.9.2
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zoltan at hidvegi dot com
typedef signed char v16si __attribute__ ((vector_size (16)));
v16si ary(signed char a)
{
return v16si{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
}
Compiled with g++-4.9 -m64 -O2 -fomit-frame-pointer -Wall -I$HOME/dev/common
-mssse3 -std=gnu++11 -S xmm_test.C
I get
pxor %xmm1, %xmm1
movd %edi, %xmm0
movl %edi, -12(%rsp)
pshufb %xmm1, %xmm0
ret
Note the unnecessary spill of edi, with gcc-4.8 this does not happen, so you
may consider this a regression. I think this may happen because it first tries
to move from gpr to xmm via the stack, but later optimizes to a direct gpr to
xmm move, but the stack spill stays.
When using -march=corei7-avx and 4x4 int vector, gcc-4.9 uses store to stack
and vbroadcastss instead of movd and pshufd $0, %xmm1, %xmm0 used by gcc-4.8,
again gcc-4.8 seems better to me. But even gcc-4.8 goes through the stack in
that case with -mtune=generic