Michael James wrote:
Conceptually, the code is:
double sum = 0;
for(i=0; i<n; ++i) { float x = ..computed..; sum += isnan(x)? 0: x; }
I have tried a half dozen variants at the source level in attempt to get gcc to do this without branching (and without calling a helper function isnan). I was not really able to succeed at either of these.
You need to specify an architecture that has cmov instruction; at least -march=i686.
Concerning the inline evaluation of isnan, I tried using __builtin_unordered(x,x) which either gets optimized out of existence when I specificy -funsafe-math-optimizations, or causes other gcc math inlines (specifically log) to not use their inline definitions when I do not specificy -funsafe-math-optimizations. For my particular problem I have a work around for this which none-the-less causes the result of isnan to end up as a condition flag in the EFLAGS register. (Instead of a test for nan, I use a test for 0 in the domain of the log.)
This testcase (similar to yours, but it actually compiles): double test(int n, double a) { double sum = 0.0; int i; for(i=0; i<n; ++i) { float x = logf((float)i); sum += isnan(x) ? 0 : x; } return sum; } produces exactly the code you are looking for (using gcc-4.2 with -march=i686): .L5: pushl %ebx fildl (%esp) addl $4, %esp fstps (%esp) fstpl -24(%ebp) call logf fucomi %st(0), %st fldz fcmovnu %st(1), %st fstp %st(1) addl $1, %ebx cmpl %esi, %ebx fldl -24(%ebp) faddp %st, %st(1) jne .L5 logf() function will be inlined by specifying -funsafe-math-optimizations, this flag also enables implicit float->double extensions for x87 math. As you probably don't need math errno from log(), -fno-math-errno should be added. Those two flags produce IMO optimal loop: .L5: pushl %eax fildl (%esp) addl $4, %esp fldln2 fxch %st(1) fyl2x fucomi %st(0), %st fldz fcmovnu %st(1), %st fstp %st(1) addl $1, %eax cmpl %edx, %eax faddp %st, %st(1) jne .L5 Uros.
Concerning the use of an unconditional add, followed by a FCMOVcc instead of a Jcc, I have had no success: I have tried code such as: