[Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer

jeff at panasas dot com Thu, 15 Sep 2005 22:38:10 -0700

We use some optimized XOR routines for software RAID.  Unfortunately, the 
compiler generated incorrect code when this was compiled for Redhat 7.3 + 
2.4.24 (this is normally kernel code).  I later found out that all versions of 
gcc that I tested (up to FC4 - 4.0.0 20050519 (Red Hat 4.0.0-8)) had this 
issue.


gcc -v on RH 7.3:

build-lin3> gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-110)

build-lin3> uname -a
Linux build-lin3 2.4.21-kdb #2 SMP Tue Apr 6 12:52:57 EDT 2004 i686 unknown

I've also tested on gcc 4.0.0:

rack-lin9$ gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-
checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-
exceptions --enable-libgcj-multifile --enable-
languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-
home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)

rack-lin9$ uname -a
Linux rack-lin9 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 i686 
i686 i386 GNU/Linux


Compile command line when test fails: 

gcc -o xor_fail -fomit-frame-pointer -O2 xor.c

Compile command line when test PASSES:

gcc -o xor xor.c

I'll attach the test program to the bug.

The generated code runs into problems in the loop:


    /* now perform the xor across a stride */
    for (offset = stride; offset < maxoffs; offset += 32) {
      /* load first strip unit */
      __asm__ __volatile__(
                           "add     %1,       %0\n"
                           "movaps   0(%0),   %%xmm0\n"
                           "movaps  16(%0),   %%xmm1\n"
                           : : "r" (bptr[0]), "r" (offset));

      /* now xor the next N-1 strip units */
      for (j = 1; j < num_of_buffers; j++){
        __asm__ __volatile__(
                             "add    %1,     %0\n"
                             "xorps  0(%0),  %%xmm0\n"
                             "xorps 16(%0),  %%xmm1\n"
                             : :  "r" (bptr[j]), "r" (offset) );
      }
      /* now write out the result */
      __asm__ __volatile__(
                           "add     %1, %0\n"
                           "movntps %%xmm0,  0(%0)\n"
                           "movntps %%xmm1, 16(%0)\n"
                           : : "r" (dest), "r"  (offset) );

    }

Specifically, in first loading the data:
      __asm__ __volatile__(
                           "add     %1,       %0\n"
                           "movaps   0(%0),   %%xmm0\n"
                           "movaps  16(%0),   %%xmm1\n"
                           : : "r" (bptr[0]), "r" (offset));

We end up referencing memory off the end of the array bptr[0].  This is 
because the loop doesn't initialize %ebx and %ebx ends up being too large to 
access this array.  The loop jumps to .L261, but .L261 is below movl (%ebp), %
ebx.

        movl    (%ebp), %ebx
        .p2align 2
.L261:
        .stabn 68,0,168,.LM68-sse_multi_xor_gen
.LM68:
#APP
        add     %edx,       %ebx
movaps   0(%ebx),   %xmm0
movaps  16(%ebx),   %xmm1

        .stabn 68,0,175,.LM69-sse_multi_xor_gen
.LM69:
#NO_APP
        movl    $1, %ecx
        cmpl    %edi, %ecx
        jge     .L273
        .p2align 2
.L265:
        .stabn 68,0,176,.LM70-sse_multi_xor_gen
.LM70:
        movl    (%ebp,%ecx,4), %eax
#APP
        add    %edx,     %eax
xorps  0(%eax),  %xmm0
xorps 16(%eax),  %xmm1

        .stabn 68,0,175,.LM71-sse_multi_xor_gen
.LM71:
#NO_APP
        incl    %ecx
        cmpl    %edi, %ecx
        jl      .L265
.L273:
        .stabn 68,0,183,.LM72-sse_multi_xor_gen
.LM72:
        movl    88(%esp), %eax
#APP
        add     %edx, %eax
movntps %xmm0,  0(%eax)
movntps %xmm1, 16(%eax)

        .stabn 68,0,166,.LM73-sse_multi_xor_gen
.LM73:
#NO_APP
        addl    $32, %edx
        cmpl    %esi, %edx
        jb      .L261

The workaround fix is to just remove -fomit-frame-pointer.  Though I'm fairly 
concerned since the Linux kernel uses -fomit-frame-pointer for the kernel 
sources.

-- 
           Summary: Incorrect code generated for SSE2 based xor routine when
                    compiled with -O2 -fomit-frame-pointer
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: c
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jeff at panasas dot com
                CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909

[Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer

Reply via email to