--- Comment #6 from jv244 at cam dot ac dot uk 2008-08-19 13:50 ---
Created an attachment (id=16099)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16099&action=view)
non-reduced testcase
unfortunately, on the non-reduced testcase (attached as collocate_fast_2.f90)
the vectorizatio
--- Comment #5 from jv244 at cam dot ac dot uk 2008-08-19 11:36 ---
Created an attachment (id=16098)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16098&action=view)
ifort asm
added the ifort asm. The remaining difference seems to be related to how data
is being loaded in the regi
--- Comment #4 from jv244 at cam dot ac dot uk 2008-08-19 10:53 ---
(In reply to comment #2)
> Note that the first complete unrolling pass unrolls loops that result in
> smaller code. This interferes with vectorization in your case, so can
> you try
unfortunately, the patch below does
--- Comment #3 from burnus at gcc dot gnu dot org 2008-08-18 16:09 ---
Same trend with "ifort -O3" (ifort 11beta) and "gfortran -O3 --fast-math
-march=native" on AMD Athlon64 X2 4800+ / openSUSE 11. [same mulsd/mulpd
numbers]
ifort 2.452s, gfortran 3.848s -> 57% slower.
With Richard's p
--- Comment #2 from rguenth at gcc dot gnu dot org 2008-08-18 15:55 ---
Note that there is no loop left on the trunk for the testcase, but after
the vectorizer it is all unrolled completely (unvectorized, of course).
Again this looks like missing vectorization of scalar code.
Note that
--- Comment #1 from jv244 at cam dot ac dot uk 2008-08-18 15:33 ---
Created an attachment (id=16082)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16082&action=view)
testcase
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150