Hello,

I am trying to get gcc to optimize an inner math loop. The first part
of the loop computes a single precision float expression (which may or
may not be NAN), and the second part sums all of these results into a
double precision total:

Conceptually, the code is:

double sum = 0;

for(i=0; i<n; ++i) {
   float x = ..computed..;
   sum += isnan(x)? 0: x;
}

I have tried a half dozen variants at the source level in attempt to
get gcc to do this without branching (and without calling a helper
function isnan). I was not really able to succeed at either of these.

Concerning the inline evaluation of isnan, I tried using
__builtin_unordered(x,x) which either gets optimized out of existence
when I specificy -funsafe-math-optimizations, or causes other gcc math
inlines (specifically log) to not use their inline definitions when I
do not specificy -funsafe-math-optimizations. For my particular
problem I have a work around for this which none-the-less causes the
result of isnan to end up as a condition flag in the EFLAGS register.
(Instead of a test for nan, I use a test for 0 in the domain of the
log.)

Concerning the use of an unconditional add, followed by a FCMOVcc
instead of a Jcc, I have had no success: I have tried code such as:

double temp = sum + x;
if (!testfornan) sum = temp;

et cetera.

The only way I know of so far to avoid the Jcc is to change my total
to an array of two elements, and add to it like this:

sum[testfornan] += x;

For which gcc happily generates SETcc instructions to generate the
array index. However, this result is less than satisfactory because it
precludes the total from being stored in a register.

I have been testing exclusively with gcc 4.0.3, but could upgrade if
that will help.

Any suggestions you can provide concerning compiler flags or source
constructs to try will be appreciated.

Regards,
Michael James

Reply via email to