------- Comment #7 from harsha dot jagasia at amd dot com 2007-06-28 00:41 ------- This is what I get without -ftree-vectorize, with -ftree-vectorize (default cost model off) and with -ftree-vectorize -fvect-cost-model respectively on an AMD x86-64 (with trunk plus the patch posted by Dorit at http://gcc.gnu.org/ml/gcc-patches/2007-06/txt00156.txt )
Case 1: (no vectorization) gfortran -static -march=opteron -msse3 -O3 -ffast-math -funroll-loops pr32084.f90 -o 4.3.novect.out time ./4.3.novect.out real 0m4.414s user 0m4.312s sys 0m0.000s Case 2: (vectorization without cost model) gfortran -static -ftree-vectorize -march=opteron -msse3 -O3 -ffast-math -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o 4.3.nocost.out time ./4.3.nocost.out real 0m4.776s user 0m4.668s sys 0m0.004s Case 3: (vectorization with cost model) gfortran -static -ftree-vectorize -fvect-cost-model -march=opteron -msse3 -O3 -ffast-math -funroll-loops -fdump-tree-vect-details -fno-show-column pr32084.f90 -o 4.3.cost.out time ./4.3.cost.out real 0m4.403s user 0m4.300s sys 0m0.000s In short, the 8% advantage that the scalar version has over the vector version disappears with the cost model. Unless I am missing something, the inner loops at lines 207 and 319 (do k = 1, 9) dont get vectorized (irrespective of the cost model). Looking at the dumps, the lines being vectorized without the cost model are the calls to TRANSPOSE and DOT_PRODUCT (line no 335, 333, 288, 223, 221 and 176). And the cost model determines that it's not profitable to vectorize these resorting to the scalar version instead. The dumps are attached. Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /home/hjagasia/autovect/src-trunk/gcc/configure --prefix=/local/hjagasia/autovect/obj-trunk-nobootstrap --enable-languages=c,c++,fortran --enable-multilib --disable-bootstrap Thread model: posix gcc version 4.3.0 20070627 (experimental) Thanks, Harsha -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084