Hello, I am trying to get gcc to optimize an inner math loop. The first part of the loop computes a single precision float expression (which may or may not be NAN), and the second part sums all of these results into a double precision total:
Conceptually, the code is: double sum = 0; for(i=0; i<n; ++i) { float x = ..computed..; sum += isnan(x)? 0: x; } I have tried a half dozen variants at the source level in attempt to get gcc to do this without branching (and without calling a helper function isnan). I was not really able to succeed at either of these. Concerning the inline evaluation of isnan, I tried using __builtin_unordered(x,x) which either gets optimized out of existence when I specificy -funsafe-math-optimizations, or causes other gcc math inlines (specifically log) to not use their inline definitions when I do not specificy -funsafe-math-optimizations. For my particular problem I have a work around for this which none-the-less causes the result of isnan to end up as a condition flag in the EFLAGS register. (Instead of a test for nan, I use a test for 0 in the domain of the log.) Concerning the use of an unconditional add, followed by a FCMOVcc instead of a Jcc, I have had no success: I have tried code such as: double temp = sum + x; if (!testfornan) sum = temp; et cetera. The only way I know of so far to avoid the Jcc is to change my total to an array of two elements, and add to it like this: sum[testfornan] += x; For which gcc happily generates SETcc instructions to generate the array index. However, this result is less than satisfactory because it precludes the total from being stored in a register. I have been testing exclusively with gcc 4.0.3, but could upgrade if that will help. Any suggestions you can provide concerning compiler flags or source constructs to try will be appreciated. Regards, Michael James