When compiling the following reduced code, both GCC 4.0.3 and 4.1.0 clutter the
assembly code with some strange moves through SSE registers.
typedef union {
long long l;
double d;
} db_number;
double test(double x[3]) {
double th = x[1] + x[2];
if (x[2] != th - x[1]) {
db_number thdb;
thdb.d = th;
thdb.l++;
th = thdb.d;
}
return x[0] + th;
}
"gcc-4.0 -S -O3 -march=pentium3" will generate:
fstpl -16(%ebp)
movlps -16(%ebp), %xmm0
je .L2
...
movlps -16(%ebp), %xmm1
movaps %xmm1, %xmm0
.L2:
movlps %xmm0, -16(%ebp)
fldl -16(%ebp)
GCC has decided that the content of "th" would be in %xmm0 (while "th" is a
double variable and the target is a SSE 1 processor!) instead of being in
-16(%ebp) where the rest of the code expects it to be. As a consequence, the
compiler has to misoptimize the code to cope with this. In comparison, below is
what GCC 3.4 generates. This seems saner and optimal to me.
fstp %st(1)
je .L2
...
fldl -16(%ebp)
.L2:
With GCC 3.4, either the value is left untouched at the top of the
floating-point stack, either it is loaded once if it was modified. In both
cases, it is directly available once the execution reaches .L2. No SSE register
is involded and there is no load/store/load sequence though a single stack
location.
This was tested with Debian packages for GCC 3.4, 4.0, and 4.1 :
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.0 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-java-awt=gtk-default --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr
--disable-werror --with-tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.0.3 (Debian 4.0.3-1)
--
Summary: GCC4 moves the result of a conditional block through
inadequate registers
Product: gcc
Version: 4.0.3
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: guillaume dot melquiond at ens-lyon dot fr
GCC host triplet: i486-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26778