------- Comment #56 from whaley at cs dot utsa dot edu 2006-08-09 21:33 ------- Dorit,
>This flag is needed in order to allow vectorization of reduction (summation >in your case) of floating-point data. OK, but this is a baaaad flag to require. From the computational scientist's point of view, there is a *vast* difference between reordering (which many aggressive optimizations imply) and failing to have IEEE compliance. Almost no computational scientist will use non-IEEE code (because you have essentially no idea if your answer is correct), but almost all will allow reordering. So, it is really important to separate the non-IEEE optimizations from the IEEE compliant ones. If vectorization requires me to throw a flag that says it causes non-IEEE arithmetic, I can't use it, and neither can anyone other than, AFAIK, some graphics guys. IEEE is the "contract" between the user and the computer, that bounds how much error there can be, and allows the programmer to know if a given algorithm will produce a usable result. Non-IEEE is therefore the death-knell for having any theoretical or a priori understanding of accuracy. So, while reordering and non-IEEE may both seem unsafe, a reordering just gives different results, which are still known to be within normal fp error, while non-IEEE means there is no contract between the programmer at all, and indeed the answer may be arbitrarily bad. Further, behavior under exceptional conditions is not maintained, and so the answer may actually be undetectably nonsensical, not merely inaccurate. Having an oddly colored pixel doesn't hurt the graphics guy, but sending a satellite into the atmosphere, or registering cancer in a clean MRI are rather more serious . . . So, mixing the two transformation types on one flag means that vectorization is unusable to what must be the majority of it's audience. Maybe I should open this as another bug report "flag mixes normal and catastrophic optimizations"? >Not really, I'm afraid - there is something that's not entirely supported >in gcc yet - see details in PR20794 Hmm. I'd tried the __attribute__, but I must have mistyped it, because it didn't work before on pointers. However, it just did in the MMBENCHV tarfile. However, the code still didn't use aligned load to access the vectors (using multiple movlpd/movhpd instead) . . . Even more scary, having the attribute calls does not change the genned assembly at all. Does the vectorization phase get this alignment info passed to it? Aligned loads can be as much as twice as fast as unaligned, and if you have to choose amongst loops in the midst of a deep loop nest, these factors can actually make vectorization a loser . . . Thanks, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827