http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957
Bug #: 53957 Summary: Polyhedron 11 benchmark: MP_PROP_DESIGN twice as long as other compiler Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: bur...@gcc.gnu.org [Note that MP_PROP_DESIGN is also discussed at the gcc-graphite mailing list, albeit more with regards to automatic parallelization.] The polyhedron benchmark (2011 version) is available at: http://www.polyhedron.com/polyhedron_benchmark_suite0html, namely: http://www.polyhedron.com/web_images/documents/pb11.zip (The original program, which also contains a ready-to-go benchmark is at http://propdesign.weebly.com/; Note that you may have to rename some input *.txt files to *TXT.) The program takes twice as long with GCC as with ifort. The program is just 502 lines long (w/o comments) and contains no subroutines or functions. It mainly consists of loops and a some math functions (sin, cos, pow, tan, atan, acos, exp). [Result on CentOS 5.7, x86-64-gnu-linux, Intel Xeon X3430 @2.40GHz] Using GCC 4.8.0 20120622 (experimental) [trunk revision 188871], I get: $ gfortran -Ofast -funroll-loops -fwhole-program -march=native mp_prop_design.f90 $ time ./a.out > /dev/null real 2m47.138s user 2m46.808s sys 0m0.236s Using Intel's ifort on Intel(R) 64, Version 12.1 Build 20120212: $ ifort -fast mp_prop_design.f90 $ time ./a.out > /dev/null real 1m25.906s user 1m25.598s sys 0m0.244s With Intel's libimf preloaded (LD_PRELOAD=.../libimf.so), GCC has: real 2m0.524s user 1m59.809s sys 0m0.689s The code features expressions like a**2.0D0, but those are converted in GCC to a*a. Using -mveclibabi=svml (and no preloading) gives the same timings as without (or slightly worse); it just calls vmldAtan2. Vectorizer: I haven't profiled this part, but I want to note that ifort vectorizes more, namely: GCC vectorizes: 662: LOOP VECTORIZED. 1032: LOOP VECTORIZED. 1060: LOOP VECTORIZED. While ifort has: mp_prop_design.f90(271): (col. 10) remark: LOOP WAS VECTORIZED. (Loop "m1 =2, 45" with conditional jump out of the loop) mp_prop_design.f90(552): (col. 16) remark: LOOP WAS VECTORIZED. (Loop with condition) mp_prop_design.f90(576): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED. (Loop with two IF blocks) mp_prop_design.f90(639): (col. 16) remark: LOOP WAS VECTORIZED. (Rather simple loop) mp_prop_design.f90(662): (col. 2) remark: LOOP WAS VECTORIZED. (Vectorized by GCC) mp_prop_design.f90(677): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED. (Line number points to the outermost of the three loops; there are also conditional jumps) mp_prop_design.f90(818): (col. 16) remark: LOOP WAS VECTORIZED. (Nested "if" blocks) mp_prop_design.f90(1032): (col. 2) remark: LOOP WAS VECTORIZED. mp_prop_design.f90(1060): (col. 2) remark: LOOP WAS VECTORIZED. (The last two are handled by GCC)