------- Comment #60 from whaley at cs dot utsa dot edu 2006-08-10 14:08 ------- Paolo,
Thanks for the explanation of what -funsafe is presently doing. >You are also confusing -funsafe-math-optimizations with -ffast-math. No, what I'm doing is reading the man page (the closest thing to a contract between gcc and me on what it is doing with my code): | -funsafe-math-optimizations | Allow optimizations for floating-point arithmetic that (a) assume | that arguments and results are valid and (b) may violate IEEE or | ANSI standards. The (b) in this statement prevents me, as a library provider that *must* be able to reassure my users that I have done nothing to violate IEEE fp standard (don't get me wrong, there's plenty of violations of the standard that occur in hardware, but typically in well-understood ways by the scientists of those platforms, and in the less important parts of the standard), from using this flag. I can't even use it after verifying that no optimization has hurt the present code, because an optimization that violates IEEE could be added at a later date, or used on a system that I'm not testing on (eg., on some systems, could cause 3DNow! vectorization). >Rules are determined by the language standards. I believe that C >mandates no reassociation; Fortran allows reassociation unless explicit >parentheses are present in the source, but this is not (yet) implemented >by GCC. My precise point. There are *lots* of C rules that a fp guy could give a crap about (for certain types of fp kernels), but IEEE is pretty much inviolate. Since this flag conflates language violations (don't care) with IEEE (catastrophic) I can't use it. I cannot stress enough just how important IEEE is: it is the only contract that tells us what it means to do a flop, and gives us any way of understanding what our answer will be. Making vectorization depend on a flag that says it is allowed to violate IEEE is therefore a killer for me (and most knowledgable fp guys). This is ironic, since vectorization of sums (as in GEMM) is usually implemented as scalar expansion on the accumulators, and this not only produces an IEEE-compliant answer, but it is *more* accurate for almost all data. Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827