http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957
Bug #: 53957
Summary: Polyhedron 11 benchmark: MP_PROP_DESIGN twice as long
as other compiler
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: [email protected]
ReportedBy: [email protected]
[Note that MP_PROP_DESIGN is also discussed at the gcc-graphite mailing list,
albeit more with regards to automatic parallelization.]
The polyhedron benchmark (2011 version) is available at:
http://www.polyhedron.com/polyhedron_benchmark_suite0html, namely:
http://www.polyhedron.com/web_images/documents/pb11.zip
(The original program, which also contains a ready-to-go benchmark is at
http://propdesign.weebly.com/; Note that you may have to rename some input
*.txt files to *TXT.)
The program takes twice as long with GCC as with ifort. The program is just 502
lines long (w/o comments) and contains no subroutines or functions. It mainly
consists of loops and a some math functions (sin, cos, pow, tan, atan, acos,
exp).
[Result on CentOS 5.7, x86-64-gnu-linux, Intel Xeon X3430 @2.40GHz]
Using GCC 4.8.0 20120622 (experimental) [trunk revision 188871], I get:
$ gfortran -Ofast -funroll-loops -fwhole-program -march=native
mp_prop_design.f90
$ time ./a.out > /dev/null
real 2m47.138s
user 2m46.808s
sys 0m0.236s
Using Intel's ifort on Intel(R) 64, Version 12.1 Build 20120212:
$ ifort -fast mp_prop_design.f90
$ time ./a.out > /dev/null
real 1m25.906s
user 1m25.598s
sys 0m0.244s
With Intel's libimf preloaded (LD_PRELOAD=.../libimf.so), GCC has:
real 2m0.524s
user 1m59.809s
sys 0m0.689s
The code features expressions like a**2.0D0, but those are converted in GCC to
a*a.
Using -mveclibabi=svml (and no preloading) gives the same timings as without
(or slightly worse); it just calls vmldAtan2.
Vectorizer: I haven't profiled this part, but I want to note that ifort
vectorizes more, namely:
GCC vectorizes:
662: LOOP VECTORIZED.
1032: LOOP VECTORIZED.
1060: LOOP VECTORIZED.
While ifort has:
mp_prop_design.f90(271): (col. 10) remark: LOOP WAS VECTORIZED.
(Loop "m1 =2, 45" with conditional jump out of the loop)
mp_prop_design.f90(552): (col. 16) remark: LOOP WAS VECTORIZED.
(Loop with condition)
mp_prop_design.f90(576): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED.
(Loop with two IF blocks)
mp_prop_design.f90(639): (col. 16) remark: LOOP WAS VECTORIZED.
(Rather simple loop)
mp_prop_design.f90(662): (col. 2) remark: LOOP WAS VECTORIZED.
(Vectorized by GCC)
mp_prop_design.f90(677): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED.
(Line number points to the outermost of the three loops; there are also
conditional jumps)
mp_prop_design.f90(818): (col. 16) remark: LOOP WAS VECTORIZED.
(Nested "if" blocks)
mp_prop_design.f90(1032): (col. 2) remark: LOOP WAS VECTORIZED.
mp_prop_design.f90(1060): (col. 2) remark: LOOP WAS VECTORIZED.
(The last two are handled by GCC)