http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49279

           Summary: Optimization incorrectly presuming constant variable
                    inside loop in g++ 4.5 and 4.6 with -O2 and -O3 for
                    x86_64 targets
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: tcmart...@gmail.com


Created attachment 24427
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24427
Testcase (reduced automatically using multidelta)

GCC is apparently producing the wrong code with Eigen2 (a template-based linear
algebra library) with optimization levels -O3 and -O2 for
x86_64-unknown-linux-gnu targets. A reduced test case is provided that
reproduces the error.

As I understand, the core of the problem is this loop (line 1132 of
the submitted test case):

for (; (ProcessFirstHalf ? i && i.index () < j : i); ++i)  {
   if (LhsIsSelfAdjoint) {
      int a = LhsIsRowMajor ? j : i.index ();
      int b = LhsIsRowMajor ? i.index () : j;
      Scalar v = i.value ();
      derived ().row (b) += ei_conj (v) * product.rhs ().row (a);        
  }
}

which is being translated into:

    movq    -8(%rsp), %rsi
    movq    (%rsi), %rbp
    addq    %rdx, %rbp
    movsd    0(%rbp), %xmm1   # <- %xmm1 is initialized here and 
.L5:                             #    no longer touched!
    leaq    0(,%rcx,8), %rsi
    leaq    4(%r8,%rcx,4), %r8
    movl    %r9d, %ecx
    jmp    .L8
.L13:                              # <-Loop here!!!
    movl    (%r8), %r10d
    addq    $4, %r8
.L8:
    movsd    0(%r13,%rsi), %xmm0
    movslq    %r10d, %r10
    addl    $1, %ecx
    mulsd    (%r12,%r10,8), %xmm0
    cmpl    %r11d, %ecx
    addsd    %xmm1, %xmm0
    movsd    %xmm0, 0(%rbp)   # <- % shouldn't %xmm1 be updated here?
    je    .L3              
    addq    $8, %rsi
    cmpl    %ecx, %r9d
    jle    .L13              # <- Loop ends 

the sum operation on line

 derived ().row (b) += ei_conj (v) * product.rhs ().row (a);

is apparently being performed by the instruction

 addsd    %xmm1, %xmm0

but the value of %xmm1 isn't being updated inside the loop!! Apparently the
compiler is presuming derived ().row (b) is constant inside the loop, which is
evidently *not* true. Since the value of %xmm1 is never updated, the 
value of derived ().row (b) at the end of the loop is equal to the last 
ei_conj (v) * product.rhs ().row (a) result.

The bug was verified on gcc versions 4.5.2 and 4.6.0 with -O2 and -O3 switches.
The compiler produces the correct code with -O0 and -O switches.

It is *NOT* present on the 4.4 branch (that is, 4.4 compiles the code
correctly) for -O0, -0, -02 and -O3 switches. 

I suppose it is a regression of the 4.5 branch.

Command line (for gcc 4.6.0):
/opt/gnu/gcc-4.6/bin/g++ -v -save-temps -nostdinc -O3  testcase.a.cpp

Compiler output:
Using built-in specs.
COLLECT_GCC=/opt/gnu/gcc-4.6/bin/g++
COLLECT_LTO_WRAPPER=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure : (reconfigured) ../configure : (reconfigured)
../configure
Thread model: posix
gcc version 4.6.0 (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus -E
-quiet -nostdinc -v -iprefix
/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/ -D_GNU_SOURCE
testcase.a.cpp -mtune=generic -march=x86-64 -O3 -fpch-preprocess -o
testcase.a.ii
#include "..." search starts here:
#include <...> search starts here:
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus
-fpreprocessed testcase.a.ii -quiet -dumpbase testcase.a.cpp -mtune=generic
-march=x86-64 -auxbase testcase.a -O3 -version -o testcase.a.s
GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version
3.0.0-p8, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu)
        compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version
3.0.0-p8, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 3f3899c46d47b31a2bc0cb7f3d1408a6
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 as --64 -o testcase.a.o testcase.a.s
COMPILER_PATH=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../libexec/gcc/
LIBRARY_PATH=/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../lib/gcc/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/nvidia-current/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc'
'-mtune=generic' '-march=x86-64'
 /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/collect2
--eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2
/usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o
/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtbegin.o
-L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0
-L/opt/gnu/gcc-4.6/bin/../lib/gcc
-L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64
-L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/nvidia-current
-L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../..
testcase.a.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc
/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtend.o
/usr/lib/../lib64/crtn.o

A test case was produced with the preprocessed output (generated from Eigen
version 2.0.15) and automatically reduced using the multidelta tool. Testcase
included.

Reply via email to