On 11/2/06, Uros Bizjak <[EMAIL PROTECTED]> wrote:

This testcase (similar to yours, but it actually compiles):


Hello,

Uros, thank you for the attention to my problem. I upgraded gcc to 4.2
and have been using -march=i686 instead of -march=pentium4 for my
tests now. gcc 4.2 resolved some but not all of my concerns. Please
see below.

double test(int n, double a)
{
  double sum = 0.0;
  int i;

  for(i=0; i<n; ++i)
    {
      float x = logf((float)i);
      sum += isnan(x) ? 0 : x;
    }

  return sum;
}

produces exactly the code you are looking for (using gcc-4.2 with -march=i686):

.L5:
        pushl   %ebx
        fildl   (%esp)
        addl    $4, %esp
        fstps   (%esp)
        fstpl   -24(%ebp)
        call    logf
        fucomi  %st(0), %st
        fldz
        fcmovnu %st(1), %st
        fstp    %st(1)
        addl    $1, %ebx
        cmpl    %esi, %ebx
        fldl    -24(%ebp)
        faddp   %st, %st(1)
        jne     .L5


I was unable to replicate your results with gcc 4.0.3, so I installed
gcc 4.2.0 20061103 (prerelease) from SVN. Using that, I am able to
replicate the loop above exactly with
-O2 -march=i686. It looks like gcc 4.2 is willing to do this
optimization; gcc 4.0 would not. :-)

logf() function will be inlined by specifying
-funsafe-math-optimizations, this flag also enables implicit
float->double extensions for x87 math. As you probably don't need math
errno from log(), -fno-math-errno should be added.

Those two flags produce IMO optimal loop:

.L5:
        pushl   %eax
        fildl   (%esp)
        addl    $4, %esp
        fldln2
        fxch    %st(1)
        fyl2x
        fucomi  %st(0), %st
        fldz
        fcmovnu %st(1), %st
        fstp    %st(1)
        addl    $1, %eax
        cmpl    %edx, %eax
        faddp   %st, %st(1)
        jne     .L5


I have been unable to replicate this result. Still, gcc 4.0.3 and gcc
4.2.0 completely omit the fucomi test and the associated semantics
with testing for NAN:

I compiled exactly the verbatim test case above, and compile using these flags:
-O2 -march=i686 -funsafe-math-optimizations -fno-math-errno

The loop I get is:

.L5:
       pushl   %eax
       addl    $1, %eax
       fildl   (%esp)
       addl    $4, %esp
       cmpl    %edx, %eax
       fldln2
       fxch    %st(1)
       fyl2x
       faddp   %st, %st(1)
       jne     .L5


Now, for this particular code, that loop may be considered a valid
optimization because log can not produce NAN from a non-negative
parameter. To be sure, I then modified the code as follows:

double test(int i0, int n, double a)
{
 double sum = 0.0;
 int i;

 for(i=i0; i<n; ++i)
   {
     float x = logf((float)i);
     sum += isnan(x) ? 0 : x;
   }

 return sum;
}

And recompiled with the same flags. The assembly code for the loop
portion is identical to the one I posted above. Now though the code is
actually capable of producing NANs.

Just to be sure, I also tested this on my modified loop:

int main(void) {
       printf("test(4, 6, 0) = %f\n", test(4,6,0));
       printf("test(0, 2, 0) = %f\n", test(0,2,0));
       printf("test(-2, 3, 0) = %f\n", test(-2,3,0));
       return 0;
}

[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2
-march=i686 -funsafe-math-optimizations -fno-math-errno uros-test.c -o
test

[EMAIL PROTECTED]:~/project/cf/util$ ./test
test(4, 6, 0) = 2.995732
test(0, 2, 0) = -inf
test(-2, 3, 0) = nan

[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -O2
-march=i686 uros-test.c -o test -lm

[EMAIL PROTECTED]:~/project/cf/util$ ./uros
test(4, 6, 0) = 2.995732
test(0, 2, 0) = -inf
test(-2, 3, 0) = -inf

[EMAIL PROTECTED]:~/project/cf/util$ /home/james/local/gcc/bin/gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../gcc-4-2/configure --prefix=/home/james/local/gcc
Thread model: posix
gcc version 4.2.0 20061103 (prerelease)

Perhaps I have not replicated your working environment closely enough,
or you have a different macro in place of the isnan call. I compiled
all code above both with and without include headers <math.h>,
<stdio.h>. I get the same results either way.

Again, help is appreciated. -- Thanks.

Regards,
Michael James

Reply via email to