4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

whaley at cs dot utsa dot edu Wed, 09 Aug 2006 14:33:53 -0700


------- Comment #56 from whaley at cs dot utsa dot edu  2006-08-09 21:33 -------
Dorit,


>This flag is needed in order to allow vectorization of reduction (summation
>in your case) of floating-point data.

OK, but this is a baaaad flag to require.  From the computational scientist's
point of view, there is a *vast* difference between reordering (which many
aggressive optimizations imply) and failing to have IEEE compliance.  Almost no
computational scientist will use non-IEEE code (because you have essentially no
idea if your answer is correct), but almost all will allow reordering.  So, it
is  really important to separate the non-IEEE optimizations from the IEEE
compliant ones.

If vectorization requires me to throw a flag that says it causes non-IEEE
arithmetic, I can't use it, and neither can anyone other than, AFAIK, some
graphics guys.  IEEE is the "contract" between the user and the computer, that
bounds how much error there can be, and allows the programmer to know if a
given algorithm will produce a usable result.  Non-IEEE is therefore the
death-knell for having any theoretical or a priori understanding of accuracy. 
So, while reordering and non-IEEE may both seem unsafe, a reordering just gives
different results, which are still known to be within normal fp error, while
non-IEEE means there is no contract between the programmer at all, and indeed
the answer may be arbitrarily bad.  Further, behavior under exceptional
conditions is not maintained, and so the answer may actually be undetectably
nonsensical, not merely inaccurate.  Having an oddly colored pixel doesn't hurt
the graphics guy, but sending a satellite into the atmosphere, or registering
cancer in a clean MRI are rather more serious . . .  So, mixing the two
transformation types on one flag means that vectorization is unusable to what
must be the majority of it's audience.  Maybe I should open this as another bug
report "flag mixes normal and catastrophic optimizations"?

>Not really, I'm afraid - there is something that's not entirely supported
>in gcc yet - see details in PR20794

Hmm.  I'd tried the __attribute__, but I must have mistyped it, because it
didn't work before on pointers.  However, it just did in the MMBENCHV tarfile. 
However, the code still didn't use aligned load to access the vectors (using
multiple movlpd/movhpd instead) . . .  Even more scary, having the attribute
calls does not change the genned assembly at all.  Does the vectorization phase
get this alignment info passed to it?

Aligned loads can be as much as twice as fast as unaligned, and if you have to
choose amongst loops in the midst of a deep loop nest, these factors can
actually make vectorization a loser . . .

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3

Reply via email to