http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54592
Bug #: 54592
Summary: [4.8 Regression] [missed-optimization] Cannot fuse SSE
move and add together
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: [email protected]
ReportedBy: [email protected]
Hi,
I have, on x86-64,
gcc version 4.7.1 (Debian 4.7.1-9)
gcc version 4.8.0 20120820 (experimental) [trunk revision 190537] (Debian
20120820-1)
Given the following test program:
#include <emmintrin.h>
void func(__m128i *foo, size_t a, size_t b, int *dst)
{
__m128i x = foo[a];
__m128i y = foo[b];
__m128i sum = _mm_add_epi32(x, y);
*dst = _mm_cvtsi128_si32(sum);
}
GCC 4.8 with -O2 compiles it to
0: 48 c1 e6 04 shl $0x4,%rsi
4: 48 c1 e2 04 shl $0x4,%rdx
8: 66 0f 6f 0c 17 movdqa (%rdi,%rdx,1),%xmm1
d: 66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0
12: 66 0f fe c1 paddd %xmm1,%xmm0
16: 66 0f 7e 01 movd %xmm0,(%rcx)
1a: c3 retq
The mov into %xmm1 here doesn't seem to make sense; it should rather be
paddd-ed in directly. And indeed, GCC 4.7 with -O2 gets this right:
0: 48 c1 e6 04 shl $0x4,%rsi
4: 48 c1 e2 04 shl $0x4,%rdx
8: 66 0f 6f 04 37 movdqa (%rdi,%rsi,1),%xmm0
d: 66 0f fe 04 17 paddd (%rdi,%rdx,1),%xmm0
12: 66 0f 7e 01 movd %xmm0,(%rcx)
16: c3 retq
This would seem like a regression to me.