http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54593
Bug #: 54593
Summary: [missed-optimization] Move from SSE to integer
register goes through the stack without -march=native
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: [email protected]
ReportedBy: [email protected]
Hi,
I have reproduced this on 4.4, 4.6, 4.7 and 4.8 (Debian 20120820-1, trunk
version 190537). Given the following code:
#include <x86intrin.h>
int test1(__m128i v) {
return _mm_cvtsi128_si32(v);
}
GCC generates
0: 66 0f 7e 44 24 f4 movd %xmm0,-0xc(%rsp)
6: 8b 44 24 f4 mov -0xc(%rsp),%eax
a: c3 retq
Shouldn't it go directly to %eax instead of through the stack? Granted, on
Netburst this takes ten cycles or so, but this is x86-64. It appears to be some
sort of tuning issue, since if I use -mtune=native (I am on an Atom) I get:
0: 66 0f 7e c0 movd %xmm0,%eax
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: c3 retq
which is sort-of what I expect. Well, the NOPs are a bit weird, but... :-)