http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59657
Bug ID: 59657 Summary: SSE intrinsics translates to AVX instructions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: oystein at gnubg dot org Created attachment 31558 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31558&action=edit Example source code file Happy new year! I writing code which should be running both on sse and avx machines. I have manually vectorized the code for SSE and AVX in two different functions and using a function pointer to set the right function according to CPU at startup. The two functions are in the same translation unit. (See attached code) compiled with: gcc -Wall -O3 -g -mavx sse_test.c -o sse_test The problem is that the sse intisics in the sse function gets translated to AVX instructions. This will of course give an illegal instruction on on all non-AVX machines. My gdb session: Program received signal SIGILL, Illegal instruction. 0x08048452 in calculate_sse (data=data@entry=0xbffff5e0, scale=scale@entry=0.5, size=size@entry=256) at sse_test.c:33 33 for ( ; count-- ; p += 4 ){ (gdb) list 28 29 static void calculate_sse(float *data, float scale, int size ) 30 { 31 int count = size >> 2; 32 float *p = data; 33 for ( ; count-- ; p += 4 ){ 34 __m128 d = _mm_load_ps( p ); 35 __m128 s = _mm_set1_ps( scale ); 36 _mm_store_ps( p, _mm_mul_ps( d, s )); 37 } (gdb) disassemble Dump of assembler code for function calculate_sse: 0x08048440 <+0>: mov 0xc(%esp),%ecx 0x08048444 <+4>: mov 0x4(%esp),%eax 0x08048448 <+8>: sar $0x2,%ecx 0x0804844b <+11>: test %ecx,%ecx 0x0804844d <+13>: lea -0x1(%ecx),%edx 0x08048450 <+16>: je 0x8048474 <calculate_sse+52> => 0x08048452 <+18>: vbroadcastss 0x8(%esp),%xmm1 0x08048459 <+25>: lea 0x0(%esi,%eiz,1),%esi 0x08048460 <+32>: vmulps (%eax),%xmm1,%xmm0 0x08048464 <+36>: sub $0x1,%edx 0x08048467 <+39>: add $0x10,%eax 0x0804846a <+42>: vmovaps %xmm0,-0x10(%eax) 0x0804846f <+47>: cmp $0xffffffff,%edx 0x08048472 <+50>: jne 0x8048460 <calculate_sse+32> 0x08048474 <+52>: repz ret End of assembler dump. (Arch linux) [oystein@oysteins-laptop ~]$ gcc --version gcc (GCC) 4.8.2 20131219 (prerelease) Bug or feature? I'm not sure if this is the expected way the intrisics should translate, but it was not what I expected. If it is supposed to be like this, can I get out of my problem without splitting the the two functions to two translation units and use two different compile options? Thanks, Øystein