4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

whaley at cs dot utsa dot edu Wed, 09 Aug 2006 16:02:11 -0700


------- Comment #58 from whaley at cs dot utsa dot edu  2006-08-09 23:01 -------
Andrew,


>Except for the fact IEEE compliant fp does not allow for reordering at all
>except
>in some small cases.  For an example is (a + b) + (-a) is not the same as (a +
>(-a)) + b,
>so reordering will invalid IEEE fp for larger a and small b.  Yes maybe we
>should split out
>the option for unsafe math fp op for reordering but that is different issue.

Thanks for the response, but I believe you are conflating two issues (as is
this flag, which is why this is bad news).  Different answers to the question
"what is this sum" does not ruin IEEE compliance.  I am referring to IEEE 754,
which is a standard set of rules for storage and arithmetic for floating point
(fp) on modern hardware.  I am unaware of their being any rules on compilation.
 I.e.  whether re-orderings are allowed is beyond the standard.  It rather is a
set of rules that discusses for floating point operations (FLOPS) how rounding
must be done, how overflow/underflow must be handled, etc.  Perhaps there is
another IEEE standard concerning compilation that you are referring to?

Now of course, floating point arithmetic in general (and IEEE-compliant fp in
specific) is not associative, so indeed (a+b+c) != (c+b+a).  However, both
sequences are valid answers to "what are these 3 things summed up", and both
are IEEE compliant if each addition is compliant.

What non-IEEE means is that the individual flops are no longer IEEE compliant. 
This means that overflow may not be handled, or exceptional conditions may
cause unknown results (eg., divide by zero), and indeed we have no way at all
of knowing what an fp add even means.  An example of a non-IEEE optimization is
using 3DNow! vectorization, because 3DNow! does not follow the IEEE standard
(for instance, it handles overflow only by saturation, which violates the
standard).  SSE (unless you turn IEEE compliance off manually) is IEEE
compliant, and this is why you see computational guys like myself using it, and
not using 3DNow!.

To a computational scientist, non-IEEE is catastophic, and "may change the
answer" is not.  "May change the answer" in this case simply means that I've
got a different ordering, which is also a valid IEEE fp answer, and indeed may
be a "better" answer than the original ordering (depending on the data; no way
to know this w/o looking at the data).  Non-IEEE means that I have no way of
knowing what kind of rounding was done, how flop was done, if underflow (or
gradual overflow!) occurred, etc.  It is for this reason that optimizations
which are non-IEEE are a killer for computational scientists, and reorders are
no big deal.  In the first you have no idea what has happened with the data,
and in the second you have an IEEE-compliant answer, which has known
properties.

It has been my experience that most compiler people (and I have some experience
there, as I got my PhD in compilation) are more concerned with integer work,
and thus not experts on fp computation.  I've done fp computational work for
the majority of my research for the last decade, so I thought I might be able
to provide useful input to bridge the camps, so to speak.  In this case, I
think that by lumping "cause different IEEE-compliant answers" in with "use
non-IEEE arithmetic" you are preventing all serious fp users from utilizing the
optimizations.  Since vectorization is of great importance on modern machines,
this is bad news.  Obviously, I may be wrong in what I say, but if reordering
makes something non-IEEE I'm going to have some students mad at me for teaching
them the wrong stuff :)

Has this made my point any clearer, or do you still think I am wrong?  If I'm
wrong, maybe you can point to the part of the IEEE standard that discusses
orderings violating the standard (as opposed to the well-known fact that all
implemented fp arithemetic is non-associative)?  After you do this, I'll have
to dig up my copy of the thing, which I don't think I've seen in the last 2
years (but I did scope some of books that cover it, and didn't find anything
about compilation).

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

Reply via email to