[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

harsha dot jagasia at amd dot com Wed, 27 Jun 2007 17:41:43 -0700


------- Comment #7 from harsha dot jagasia at amd dot com  2007-06-28 00:41 
-------
This is what I get without -ftree-vectorize, with -ftree-vectorize (default
cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an
AMD x86-64 (with trunk plus the patch posted by Dorit at
http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )


Case 1: (no vectorization)
gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops
pr32084.f90 -o 4.3.novect.out
time ./4.3.novect.out
real    0m4.414s
user    0m4.312s
sys     0m0.000s

Case 2: (vectorization without cost model)
gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math
-funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o
4.3.nocost.out
time ./4.3.nocost.out
real    0m4.776s
user    0m4.668s
sys     0m0.004s

Case 3: (vectorization with cost model)
gfortran -static -ftree-vectorize -fvect-cost-model -march=opteron -msse3 -O3
-ffast-math -funroll-loops -fdump-tree-vect-details -fno-show-column
pr32084.f90 -o 4.3.cost.out
time ./4.3.cost.out
real    0m4.403s
user    0m4.300s
sys     0m0.000s

In short, the 8% advantage that the scalar version has over the vector version
disappears with the cost model.

Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1,
9) dont get vectorized (irrespective of the cost model).

Looking at the dumps, the lines being vectorized without the cost model are the
calls to TRANSPOSE and DOT_PRODUCT (line no 335, 333, 288, 223, 221 and 176).
And the cost model determines that it's not profitable to vectorize these
resorting to the scalar version instead.

The dumps are attached.

Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /home/hjagasia/autovect/src-trunk/gcc/configure
--prefix=/local/hjagasia/autovect/obj-trunk-nobootstrap
--enable-languages=c,c++,fortran --enable-multilib --disable-bootstrap
Thread model: posix
gcc version 4.3.0 20070627 (experimental)

Thanks,
Harsha


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084

[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor

Reply via email to