http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49279
Summary: Optimization incorrectly presuming constant variable inside loop in g++ 4.5 and 4.6 with -O2 and -O3 for x86_64 targets Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: tcmart...@gmail.com Created attachment 24427 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24427 Testcase (reduced automatically using multidelta) GCC is apparently producing the wrong code with Eigen2 (a template-based linear algebra library) with optimization levels -O3 and -O2 for x86_64-unknown-linux-gnu targets. A reduced test case is provided that reproduces the error. As I understand, the core of the problem is this loop (line 1132 of the submitted test case): for (; (ProcessFirstHalf ? i && i.index () < j : i); ++i) { if (LhsIsSelfAdjoint) { int a = LhsIsRowMajor ? j : i.index (); int b = LhsIsRowMajor ? i.index () : j; Scalar v = i.value (); derived ().row (b) += ei_conj (v) * product.rhs ().row (a); } } which is being translated into: movq -8(%rsp), %rsi movq (%rsi), %rbp addq %rdx, %rbp movsd 0(%rbp), %xmm1 # <- %xmm1 is initialized here and .L5: # no longer touched! leaq 0(,%rcx,8), %rsi leaq 4(%r8,%rcx,4), %r8 movl %r9d, %ecx jmp .L8 .L13: # <-Loop here!!! movl (%r8), %r10d addq $4, %r8 .L8: movsd 0(%r13,%rsi), %xmm0 movslq %r10d, %r10 addl $1, %ecx mulsd (%r12,%r10,8), %xmm0 cmpl %r11d, %ecx addsd %xmm1, %xmm0 movsd %xmm0, 0(%rbp) # <- % shouldn't %xmm1 be updated here? je .L3 addq $8, %rsi cmpl %ecx, %r9d jle .L13 # <- Loop ends the sum operation on line derived ().row (b) += ei_conj (v) * product.rhs ().row (a); is apparently being performed by the instruction addsd %xmm1, %xmm0 but the value of %xmm1 isn't being updated inside the loop!! Apparently the compiler is presuming derived ().row (b) is constant inside the loop, which is evidently *not* true. Since the value of %xmm1 is never updated, the value of derived ().row (b) at the end of the loop is equal to the last ei_conj (v) * product.rhs ().row (a) result. The bug was verified on gcc versions 4.5.2 and 4.6.0 with -O2 and -O3 switches. The compiler produces the correct code with -O0 and -O switches. It is *NOT* present on the 4.4 branch (that is, 4.4 compiles the code correctly) for -O0, -0, -02 and -O3 switches. I suppose it is a regression of the 4.5 branch. Command line (for gcc 4.6.0): /opt/gnu/gcc-4.6/bin/g++ -v -save-temps -nostdinc -O3 testcase.a.cpp Compiler output: Using built-in specs. COLLECT_GCC=/opt/gnu/gcc-4.6/bin/g++ COLLECT_LTO_WRAPPER=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../configure : (reconfigured) ../configure : (reconfigured) ../configure Thread model: posix gcc version 4.6.0 (GCC) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus -E -quiet -nostdinc -v -iprefix /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/ -D_GNU_SOURCE testcase.a.cpp -mtune=generic -march=x86-64 -O3 -fpch-preprocess -o testcase.a.ii #include "..." search starts here: #include <...> search starts here: End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/cc1plus -fpreprocessed testcase.a.ii -quiet -dumpbase testcase.a.cpp -mtune=generic -march=x86-64 -auxbase testcase.a -O3 -version -o testcase.a.s GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version 3.0.0-p8, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 GNU C++ (GCC) version 4.6.0 (x86_64-unknown-linux-gnu) compiled by GNU C version 4.6.0, GMP version 4.3.2, MPFR version 3.0.0-p8, MPC version 0.9 GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 Compiler executable checksum: 3f3899c46d47b31a2bc0cb7f3d1408a6 COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64' as --64 -o testcase.a.o testcase.a.s COMPILER_PATH=/opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../libexec/gcc/ LIBRARY_PATH=/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/:/opt/gnu/gcc-4.6/bin/../lib/gcc/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/usr/lib/nvidia-current/:/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../:/lib/:/usr/lib/ COLLECT_GCC_OPTIONS='-v' '-save-temps' '-nostdinc' '-O3' '-shared-libgcc' '-mtune=generic' '-march=x86-64' /opt/gnu/gcc-4.6/bin/../libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/collect2 --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtbegin.o -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0 -L/opt/gnu/gcc-4.6/bin/../lib/gcc -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/nvidia-current -L/opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/../../.. testcase.a.o -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /opt/gnu/gcc-4.6/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.6.0/crtend.o /usr/lib/../lib64/crtn.o A test case was produced with the preprocessed output (generated from Eigen version 2.0.15) and automatically reduced using the multidelta tool. Testcase included.