http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Bug #: 53346
Summary: [4.6/4.7/4.8 Regression] Bad vectorization in the proc
cptrf2 of rnflow.f90
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected], [email protected]
At revision 187457 (i.e., with pr53340 fixed) on x86_64-apple-darwin10, after
[macbook] test/dbg_rnflow% gfc -c -O3 -ffast-math -funroll-loops timctr.f90
cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90
idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90
dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90
cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90
lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
23.872u
0.349s 0:24.22 99.9% 0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow%
/opt/gcc/gcc4.8p-187339/bin/gfortran -c -O3 -ffast-math -funroll-loops
evlrnf.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.259u 0.346s 0:22.61 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9% 0+0k 0+0io 0pf+0w
(i.e., working around prpr53342 and a regression for idamax.f90, see
below), the compilation of cptrf2.f90 (source attached to pr53340) with the
following flags yiels
optimization level 4.4.6 4.5.3 4.6.3 4.7.0 r187457
-O2 27.8 28.2 28.2 21.8 21.8
-O2 -ftree-vectorize 27.8 28.2 28.2 27.9 27.9
-O3 22.0 21.3 25.1 25.3 25.3
-O3 -fno-tree-vectorize 22.1 21.3 21.4 21.4 21.4
Note that 4.5/4.6/4.7 vectorize two loops (lines 21 and 29), while 4.8
vectorizes only the loop at line 21 (29: not vectorized: iteration count too
small.).
Looking at my archives I have found that a first regression appeared
between revisions 162456 and 164728
optimization level 4.6-162456 4.6p-164728
-O2 28.2 28.3
-O2 -ftree-vectorize 28.1 28.3
-O3 21.4 29.4
-O3 -fno-tree-vectorize 21.3 21.4
-O3 -ffast-math 21.4 22.3
-O3 -ffast-math -funroll-loops 21.9 22.4
For the record, as said above the compilation of idamax regressed between
revisions 187102 and 187291
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9% 0+0k 0+0io 0pf+0w
Although the regression is slightly above the noise margin at the level of
rnflow.f90, it could be worth to investigate it because:
(1) it is a LAPACK routine (may be slightly modified),
(2) there equivalent intrinsics in F90,
(3) the slowdown may be quite significant at the level of the proc itself.