[Bug c/23909] New: Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer

2005-09-15 Thread jeff at panasas dot com
We use some optimized XOR routines for software RAID.  Unfortunately, the 
compiler generated incorrect code when this was compiled for Redhat 7.3 + 
2.4.24 (this is normally kernel code).  I later found out that all versions of 
gcc that I tested (up to FC4 - 4.0.0 20050519 (Red Hat 4.0.0-8)) had this 
issue.

gcc -v on RH 7.3:

build-lin3> gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-110)

build-lin3> uname -a
Linux build-lin3 2.4.21-kdb #2 SMP Tue Apr 6 12:52:57 EDT 2004 i686 unknown

I've also tested on gcc 4.0.0:

rack-lin9$ gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --
infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-
checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-
exceptions --enable-libgcj-multifile --enable-
languages=c,c++,objc,java,f95,ada --enable-java-awt=gtk --with-java-
home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --host=i386-redhat-linux
Thread model: posix
gcc version 4.0.0 20050519 (Red Hat 4.0.0-8)

rack-lin9$ uname -a
Linux rack-lin9 2.6.11-1.1369_FC4smp #1 SMP Thu Jun 2 23:08:39 EDT 2005 i686 
i686 i386 GNU/Linux


Compile command line when test fails: 

gcc -o xor_fail -fomit-frame-pointer -O2 xor.c

Compile command line when test PASSES:

gcc -o xor xor.c

I'll attach the test program to the bug.

The generated code runs into problems in the loop:


/* now perform the xor across a stride */
for (offset = stride; offset < maxoffs; offset += 32) {
  /* load first strip unit */
  __asm__ __volatile__(
   "add %1,   %0\n"
   "movaps   0(%0),   %%xmm0\n"
   "movaps  16(%0),   %%xmm1\n"
   : : "r" (bptr[0]), "r" (offset));

  /* now xor the next N-1 strip units */
  for (j = 1; j < num_of_buffers; j++){
__asm__ __volatile__(
 "add%1, %0\n"
 "xorps  0(%0),  %%xmm0\n"
 "xorps 16(%0),  %%xmm1\n"
 : :  "r" (bptr[j]), "r" (offset) );
  }
  /* now write out the result */
  __asm__ __volatile__(
   "add %1, %0\n"
   "movntps %%xmm0,  0(%0)\n"
   "movntps %%xmm1, 16(%0)\n"
   : : "r" (dest), "r"  (offset) );

}

Specifically, in first loading the data:
  __asm__ __volatile__(
   "add %1,   %0\n"
   "movaps   0(%0),   %%xmm0\n"
   "movaps  16(%0),   %%xmm1\n"
   : : "r" (bptr[0]), "r" (offset));

We end up referencing memory off the end of the array bptr[0].  This is 
because the loop doesn't initialize %ebx and %ebx ends up being too large to 
access this array.  The loop jumps to .L261, but .L261 is below movl (%ebp), %
ebx.

movl(%ebp), %ebx
.p2align 2
.L261:
.stabn 68,0,168,.LM68-sse_multi_xor_gen
.LM68:
#APP
add %edx,   %ebx
movaps   0(%ebx),   %xmm0
movaps  16(%ebx),   %xmm1

.stabn 68,0,175,.LM69-sse_multi_xor_gen
.LM69:
#NO_APP
movl$1, %ecx
cmpl%edi, %ecx
jge .L273
.p2align 2
.L265:
.stabn 68,0,176,.LM70-sse_multi_xor_gen
.LM70:
movl(%ebp,%ecx,4), %eax
#APP
add%edx, %eax
xorps  0(%eax),  %xmm0
xorps 16(%eax),  %xmm1

.stabn 68,0,175,.LM71-sse_multi_xor_gen
.LM71:
#NO_APP
incl%ecx
cmpl%edi, %ecx
jl  .L265
.L273:
.stabn 68,0,183,.LM72-sse_multi_xor_gen
.LM72:
movl88(%esp), %eax
#APP
add %edx, %eax
movntps %xmm0,  0(%eax)
movntps %xmm1, 16(%eax)

.stabn 68,0,166,.LM73-sse_multi_xor_gen
.LM73:
#NO_APP
addl$32, %edx
cmpl%esi, %edx
jb  .L261

The workaround fix is to just remove -fomit-frame-pointer.  Though I'm fairly 
concerned since the Linux kernel uses -fomit-frame-pointer for the kernel 
sources.

-- 
   Summary: Incorrect code generated for SSE2 based xor routine when
compiled with -O2 -fomit-frame-pointer
       Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: jeff at panasas dot com
CC: gcc-bugs at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


[Bug c/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer

2005-09-15 Thread jeff at panasas dot com

--- Additional Comments From jeff at panasas dot com  2005-09-16 05:44 
---
Created an attachment (id=9738)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=9738&action=view)
Xor test program


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909


[Bug target/23909] Incorrect code generated for SSE2 based xor routine when compiled with -O2 -fomit-frame-pointer

2005-09-16 Thread jeff at panasas dot com

--- Additional Comments From jeff at panasas dot com  2005-09-16 14:50 
---
(In reply to comment #3)
> This works for me with 4.1.0 but I really think it is just an accident.
>   /[EMAIL PROTECTED]@*/ int cr0;   /* really, it's used, but lclint doesn't 
> "get" 
__asm__ */
> This comment does not make sense.
> Specificly since GCC also warns about it:
> t.c:110: warning: unused variable ‘cr0’
> t.c:109: warning: unused variable ‘reg_store’
> I still think you are making a mistake in your code by using inline-asm.

I pulled out some of the code required in the kernel.  When using SSE in the 
kernel you have to save and restore cr0 but you can't do that at userlevel.  
The comment only refers to lclint, a tool that we use to statically check our 
code.  lclint doesn't parse the inline asm, so we have to annotate the code.

reg_store is also another local that is used to save and restore the xmm 
registers when running in the kernel.  you can just ignore this at user-level.

I'm not sure why SSE intrinsics will help here?  This bug is will "go-away" 
when even very small changes are made to that loop, so just about any change 
will mask the bug.

The other problem that we have is that this code is compiled with several 
versions of gcc (2.95.2 -> 3.4) so the inline asm is a good common 
denominator.  I'd be willing to move to intrinisics if it solved the problem 
rather than masked it.  Are there other reasons that the inline asm is a bad 
idea?  I believe the code is completely legal inline asm.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23909