https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78954
Bug ID: 78954
Summary: optimization: broadcast of non-constant scalar into
SSE2 register
Product: gcc
Version: 6.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: jens.maurer at gmx dot net
Target Milestone: ---
The following code goes through the stack instead of directly moving from the
register for "x" into (the low part of) "v":
#pragma GCC target ("sse2")
typedef unsigned int V __attribute__((vector_size(16)));
V f(int x)
{
V v = { x, x, x, x };
return v;
}
$ gcc -v -O3 -S x.cc
Target: x86_64-pc-linux-gnu
gcc version 6.3.0 (GCC)
snippet from assembly:
movl %edi, -12(%rsp)
movd -12(%rsp), %xmm1
pshufd $0, %xmm1, %xmm0
ret
Why do we move through the stack, instead of using a simple register move?
movd %edi, %xmm1
pshufd $0, %xmm1, %xmm0