http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59657
Bug ID: 59657
Summary: SSE intrinsics translates to AVX instructions
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: oystein at gnubg dot org
Created attachment 31558
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31558&action=edit
Example source code file
Happy new year!
I writing code which should be running both on sse and avx machines. I have
manually vectorized the code for SSE and AVX in two different functions and
using a function pointer to set the right function according to CPU at startup.
The two functions are in the same translation unit.
(See attached code)
compiled with:
gcc -Wall -O3 -g -mavx sse_test.c -o sse_test
The problem is that the sse intisics in the sse function gets translated to AVX
instructions. This will of course give an illegal instruction on on all non-AVX
machines.
My gdb session:
Program received signal SIGILL, Illegal instruction.
0x08048452 in calculate_sse (data=data@entry=0xbffff5e0, scale=scale@entry=0.5,
size=size@entry=256)
at sse_test.c:33
33 for ( ; count-- ; p += 4 ){
(gdb) list
28
29 static void calculate_sse(float *data, float scale, int size )
30 {
31 int count = size >> 2;
32 float *p = data;
33 for ( ; count-- ; p += 4 ){
34 __m128 d = _mm_load_ps( p );
35 __m128 s = _mm_set1_ps( scale );
36 _mm_store_ps( p, _mm_mul_ps( d, s ));
37 }
(gdb) disassemble
Dump of assembler code for function calculate_sse:
0x08048440 <+0>: mov 0xc(%esp),%ecx
0x08048444 <+4>: mov 0x4(%esp),%eax
0x08048448 <+8>: sar $0x2,%ecx
0x0804844b <+11>: test %ecx,%ecx
0x0804844d <+13>: lea -0x1(%ecx),%edx
0x08048450 <+16>: je 0x8048474 <calculate_sse+52>
=> 0x08048452 <+18>: vbroadcastss 0x8(%esp),%xmm1
0x08048459 <+25>: lea 0x0(%esi,%eiz,1),%esi
0x08048460 <+32>: vmulps (%eax),%xmm1,%xmm0
0x08048464 <+36>: sub $0x1,%edx
0x08048467 <+39>: add $0x10,%eax
0x0804846a <+42>: vmovaps %xmm0,-0x10(%eax)
0x0804846f <+47>: cmp $0xffffffff,%edx
0x08048472 <+50>: jne 0x8048460 <calculate_sse+32>
0x08048474 <+52>: repz ret
End of assembler dump.
(Arch linux)
[oystein@oysteins-laptop ~]$ gcc --version
gcc (GCC) 4.8.2 20131219 (prerelease)
Bug or feature? I'm not sure if this is the expected way the intrisics should
translate, but it was not what I expected. If it is supposed to be like this,
can I get out of my problem without splitting the the two functions to two
translation units and use two different compile options?
Thanks,
Øystein