------- Comment #54 from whaley at cs dot utsa dot edu  2006-08-09 16:08 -------
Dorit,

OK, I've posted a new tarfile with a safe kernel code where the loop is not
unrolled, so that the vectorizer has a chance.  With this kernel, I can make it
vectorize code, but only if I throw the -funsafe-math-optimizations flag.  This
kernel doesn't use a lot of registers, so it should work for both x86-32 and
x86-64 archs.

I would expect for the vectorized code to beat the x87 in both precisions on
the P4E (vector SSE has two and four times the peak of x87 respectively), and
beat the x87 code in single on the Ath64 (twice the peak).  So far,
vectorization is never a win on the P4e, but I can make single win on Ath64. 
On both platforms, editing the assembly confirms that there are loops in there
that use the vector instructions.  Once I understand better what's going on,
maybe I can improve this . . .

Here's some questions I need to figure out:
(1) Why do I have to throw the -funsafe-math-optimizations flag to enable this?
   -- I see where the .vect file warns of it, but it refers to an SSA line,
      so I'm not sure what's going on.
   -- ATLAS cannot throw this flag, because it enables non-IEEE fp arithmetic,
      and ATLAS must maintain IEEE compliance.  SSE itself does *not* require
      ruining IEEE compliance.
   -- Let me know if there is some way in the code that I can avoid this prob
   -- If it cannot be avoided, is there a way to make this optimization
      controlled by a flag that does not mean a loss of IEEE compliance?
(2) Is there any pragma or assertion, etc, that I can put in the code to
    notify the compiler that certain pointers point to 16-byte aligned data?
    -- Only the output array (C) is possibly misaligned in ATLAS

Thanks,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

Reply via email to