http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59657

            Bug ID: 59657
           Summary: SSE intrinsics translates to AVX instructions
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: oystein at gnubg dot org

Created attachment 31558
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31558&action=edit
Example source code file

Happy new year!

I writing code which should be running both on sse and avx machines. I have
manually vectorized the code for SSE and AVX in two different functions and
using a function pointer to set the right function according to CPU at startup.
The two functions are in the same translation unit.

(See attached code)

compiled with: 
gcc -Wall -O3 -g -mavx sse_test.c -o sse_test

The problem is that the sse intisics in the sse function gets translated to AVX
instructions. This will of course give an illegal instruction on on all non-AVX
machines. 

My gdb session:

Program received signal SIGILL, Illegal instruction.
0x08048452 in calculate_sse (data=data@entry=0xbffff5e0, scale=scale@entry=0.5,
size=size@entry=256)
    at sse_test.c:33
33        for ( ; count-- ; p += 4 ){
(gdb) list
28    
29    static void calculate_sse(float *data, float scale, int size )
30    {
31        int count = size >> 2;
32        float *p = data;
33        for ( ; count-- ; p += 4 ){
34            __m128 d = _mm_load_ps( p );
35            __m128 s = _mm_set1_ps( scale );
36            _mm_store_ps( p, _mm_mul_ps( d, s ));
37        }
(gdb) disassemble 
Dump of assembler code for function calculate_sse:
   0x08048440 <+0>:    mov    0xc(%esp),%ecx
   0x08048444 <+4>:    mov    0x4(%esp),%eax
   0x08048448 <+8>:    sar    $0x2,%ecx
   0x0804844b <+11>:    test   %ecx,%ecx
   0x0804844d <+13>:    lea    -0x1(%ecx),%edx
   0x08048450 <+16>:    je     0x8048474 <calculate_sse+52>
=> 0x08048452 <+18>:    vbroadcastss 0x8(%esp),%xmm1
   0x08048459 <+25>:    lea    0x0(%esi,%eiz,1),%esi
   0x08048460 <+32>:    vmulps (%eax),%xmm1,%xmm0
   0x08048464 <+36>:    sub    $0x1,%edx
   0x08048467 <+39>:    add    $0x10,%eax
   0x0804846a <+42>:    vmovaps %xmm0,-0x10(%eax)
   0x0804846f <+47>:    cmp    $0xffffffff,%edx
   0x08048472 <+50>:    jne    0x8048460 <calculate_sse+32>
   0x08048474 <+52>:    repz ret 
End of assembler dump.

(Arch linux)
[oystein@oysteins-laptop ~]$ gcc --version 
gcc (GCC) 4.8.2 20131219 (prerelease)

Bug or feature? I'm not sure if this is the expected way the intrisics should
translate, but it was not what I expected. If it is supposed to be like this,
can I get out of my problem without splitting the the two functions to two
translation units and use two different compile options?

Thanks,
Øystein

Reply via email to