On 11/2/06, Uros Bizjak <[EMAIL PROTECTED]> wrote:
This testcase (similar to yours, but it actually compiles):
Hello, Uros, thank you for the attention to my problem. I upgraded gcc to 4.2 and have been using -march=i686 instead of -march=pentium4 for my tests now. gcc 4.2 resolved some but not all of my concerns. Please see below.
double test(int n, double a) { double sum = 0.0; int i; for(i=0; i<n; ++i) { float x = logf((float)i); sum += isnan(x) ? 0 : x; } return sum; } produces exactly the code you are looking for (using gcc-4.2 with -march=i686): .L5: pushl %ebx fildl (%esp) addl $4, %esp fstps (%esp) fstpl -24(%ebp) call logf fucomi %st(0), %st fldz fcmovnu %st(1), %st fstp %st(1) addl $1, %ebx cmpl %esi, %ebx fldl -24(%ebp) faddp %st, %st(1) jne .L5
I was unable to replicate your results with gcc 4.0.3, so I installed gcc 4.2.0 20061103 (prerelease) from SVN. Using that, I am able to replicate the loop above exactly with -O2 -march=i686. It looks like gcc 4.2 is willing to do this optimization; gcc 4.0 would not. :-)
logf() function will be inlined by specifying -funsafe-math-optimizations, this flag also enables implicit float->double extensions for x87 math. As you probably don't need math errno from log(), -fno-math-errno should be added. Those two flags produce IMO optimal loop: .L5: pushl %eax fildl (%esp) addl $4, %esp fldln2 fxch %st(1) fyl2x fucomi %st(0), %st fldz fcmovnu %st(1), %st fstp %st(1) addl $1, %eax cmpl %edx, %eax faddp %st, %st(1) jne .L5
I have been unable to replicate this result. Still, gcc 4.0.3 and gcc 4.2.0 completely omit the fucomi test and the associated semantics with testing for NAN: I compiled exactly the verbatim test case above, and compile using these flags: -O2 -march=i686 -funsafe-math-optimizations -fno-math-errno The loop I get is: .L5: pushl %eax addl $1, %eax fildl (%esp) addl $4, %esp cmpl %edx, %eax fldln2 fxch %st(1) fyl2x faddp %st, %st(1) jne .L5 Now, for this particular code, that loop may be considered a valid optimization because log can not produce NAN from a non-negative parameter. To be sure, I then modified the code as follows: double test(int i0, int n, double a) { double sum = 0.0; int i; for(i=i0; i<n; ++i) { float x = logf((float)i); sum += isnan(x) ? 0 : x; } return sum; } And recompiled with the same flags. The assembly code for the loop portion is identical to the one I posted above. Now though the code is actually capable of producing NANs. Just to be sure, I also tested this on my modified loop: int main(void) { printf("test(4, 6, 0) = %f\n", test(4,6,0)); printf("test(0, 2, 0) = %f\n", test(0,2,0)); printf("test(-2, 3, 0) = %f\n", test(-2,3,0)); return 0; } [EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2 -march=i686 -funsafe-math-optimizations -fno-math-errno uros-test.c -o test [EMAIL PROTECTED]:~/project/cf/util$ ./test test(4, 6, 0) = 2.995732 test(0, 2, 0) = -inf test(-2, 3, 0) = nan [EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2 -march=i686 uros-test.c -o test -lm [EMAIL PROTECTED]:~/project/cf/util$ ./uros test(4, 6, 0) = 2.995732 test(0, 2, 0) = -inf test(-2, 3, 0) = -inf [EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -v Using built-in specs. Target: i686-pc-linux-gnu Configured with: ../gcc-4-2/configure --prefix=/home/james/local/gcc Thread model: posix gcc version 4.2.0 20061103 (prerelease) Perhaps I have not replicated your working environment closely enough, or you have a different macro in place of the isnan call. I compiled all code above both with and without include headers <math.h>, <stdio.h>. I get the same results either way. Again, help is appreciated. -- Thanks. Regards, Michael James