http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59697

            Bug ID: 59697
           Summary: Function attribute __target_(("no-avx)) work on
                    Windows/mingw but fails on Linux.
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: oystein at gnubg dot org

Created attachment 31754
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31754&action=edit
Example source code file

Hi,

As I last week tried to compile a file with one function to sse instruction and
another to avx, (see invalid bug report #59657), I tried to solve this by using
function attributes.

static void calculate_sse(float *data, float scale, int size ) __attribute__
((__target__ ("no-avx")));
static void calculate_avx(float *data, float scale, int size );

(Full example code attached)

When I compile this as my mingw system, the produced code is as expected, and
the code runs good at my no-avx Windows-machine.

Compiled with:
 gcc -Wall -O3 -g -mavx sse_test.c -o sse_test

[10:52:28,97 C:\APPL\ssetest]# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=c:/appl/mingw/bin/../libexec/gcc/mingw32/4.7.2/lto-wrapper.exe
Target: mingw32
Configured with: ../gcc-4.7.2/configure
--enable-languages=c,c++,ada,fortran,objc,obj-c++
--disable-sjlj-exceptions --wit
h-dwarf2 --enable-shared --enable-libgomp --disable-win32-registry
--enable-libstdcxx-debug --disable-build-poststage1-wi
th-cxx --enable-version-specific-runtime-libs --build=mingw32 --prefix=/mingw
Thread model: win32
gcc version 4.7.2 (GCC)

Disassembly of the calculate_sse functions looks like this:
(gdb) disassemble  calculate_sse
Dump of assembler code for function calculate_sse:
   0x0040138c <+0>:     mov    0xc(%esp),%eax
   0x00401390 <+4>:     sar    $0x2,%eax
   0x00401393 <+7>:     lea    -0x1(%eax),%edx
   0x00401396 <+10>:    test   %eax,%eax
   0x00401398 <+12>:    je     0x4013be <calculate_sse+50>
   0x0040139a <+14>:    mov    0x4(%esp),%eax
   0x0040139e <+18>:    movss  0x8(%esp),%xmm0
   0x004013a4 <+24>:    shufps $0x0,%xmm0,%xmm0
   0x004013a8 <+28>:    movaps %xmm0,%xmm1
   0x004013ab <+31>:    nop
   0x004013ac <+32>:    movaps (%eax),%xmm0
   0x004013af <+35>:    mulps  %xmm1,%xmm0
   0x004013b2 <+38>:    movaps %xmm0,(%eax)
   0x004013b5 <+41>:    add    $0x10,%eax
   0x004013b8 <+44>:    dec    %edx
   0x004013b9 <+45>:    cmp    $0xffffffff,%edx
   0x004013bc <+48>:    jne    0x4013ac <calculate_sse+32>
   0x004013be <+50>:    ret
End of assembler dump.

All nice and as expected on Windows/mingw!

I then try the same code with the same function attribute on my Arch linux
laptop (also non-avx CPU). But on this system the calculate_sse functions is
compiled with avx instruction despite the function attribute.

I compile with the same command line:
 gcc -Wall -O3 -g -mavx sse_test.c -o sse_test

[oystein@oysteins-laptop ~]$ gcc --version
gcc (GCC) 4.8.2 20131219 (prerelease)

Produced disassembly of calculate_sse function:
(gdb) disassemble 
Dump of assembler code for function calculate_sse:
   0x08048440 <+0>:    mov    0xc(%esp),%ecx
   0x08048444 <+4>:    mov    0x4(%esp),%eax
   0x08048448 <+8>:    sar    $0x2,%ecx
   0x0804844b <+11>:    test   %ecx,%ecx
   0x0804844d <+13>:    lea    -0x1(%ecx),%edx
   0x08048450 <+16>:    je     0x8048474 <calculate_sse+52>
=> 0x08048452 <+18>:    vbroadcastss 0x8(%esp),%xmm1
   0x08048459 <+25>:    lea    0x0(%esi,%eiz,1),%esi
   0x08048460 <+32>:    vmulps (%eax),%xmm1,%xmm0
   0x08048464 <+36>:    sub    $0x1,%edx
   0x08048467 <+39>:    add    $0x10,%eax
   0x0804846a <+42>:    vmovaps %xmm0,-0x10(%eax)
   0x0804846f <+47>:    cmp    $0xffffffff,%edx
   0x08048472 <+50>:    jne    0x8048460 <calculate_sse+32>
   0x08048474 <+52>:    repz ret 
End of assembler dump.

I'm always unsure what to expect, but at least I did not the behavior to be
different on the two systems. Well, it might not be the Windows/Linux systems
that makes the difference but maybe rather the 4.7.x / 4.8.x gcc-version that
makes the difference. What do I know?

Thanks,
-Øystein

Reply via email to