http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957

             Bug #: 53957
           Summary: Polyhedron 11 benchmark: MP_PROP_DESIGN twice as long
                    as other compiler
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: bur...@gcc.gnu.org


[Note that MP_PROP_DESIGN is also discussed at the gcc-graphite mailing list,
albeit more with regards to automatic parallelization.]

The polyhedron benchmark (2011 version) is available at:
http://www.polyhedron.com/polyhedron_benchmark_suite0html, namely:
http://www.polyhedron.com/web_images/documents/pb11.zip

(The original program, which also contains a ready-to-go benchmark is at
http://propdesign.weebly.com/; Note that you may have to rename some input
*.txt files to *TXT.)


The program takes twice as long with GCC as with ifort. The program is just 502
lines long (w/o comments) and contains no subroutines or functions. It mainly
consists of loops and a some math functions (sin, cos, pow, tan, atan, acos,
exp).


[Result on CentOS 5.7, x86-64-gnu-linux, Intel Xeon X3430 @2.40GHz]


Using GCC 4.8.0 20120622 (experimental) [trunk revision 188871], I get:

$ gfortran -Ofast -funroll-loops -fwhole-program -march=native
mp_prop_design.f90
$ time ./a.out > /dev/null 

real    2m47.138s
user    2m46.808s
sys     0m0.236s


Using Intel's ifort on Intel(R) 64, Version 12.1 Build 20120212:

$ ifort -fast mp_prop_design.f90
$ time ./a.out > /dev/null 
real    1m25.906s
user    1m25.598s
sys     0m0.244s


With Intel's libimf preloaded (LD_PRELOAD=.../libimf.so), GCC has:

real    2m0.524s
user    1m59.809s
sys     0m0.689s



The code features expressions like a**2.0D0, but those are converted in GCC to
a*a.

Using -mveclibabi=svml (and no preloading) gives the same timings as without
(or slightly worse); it just calls vmldAtan2.


Vectorizer: I haven't profiled this part, but I want to note that ifort
vectorizes more, namely:

GCC vectorizes:

662: LOOP VECTORIZED.
1032: LOOP VECTORIZED.
1060: LOOP VECTORIZED.


While ifort has:

mp_prop_design.f90(271): (col. 10) remark: LOOP WAS VECTORIZED.
  (Loop "m1 =2, 45" with conditional jump out of the loop)
mp_prop_design.f90(552): (col. 16) remark: LOOP WAS VECTORIZED.
  (Loop with condition)
mp_prop_design.f90(576): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED.
  (Loop with two IF blocks)
mp_prop_design.f90(639): (col. 16) remark: LOOP WAS VECTORIZED.
  (Rather simple loop)
mp_prop_design.f90(662): (col.  2) remark: LOOP WAS VECTORIZED.
  (Vectorized by GCC)
mp_prop_design.f90(677): (col. 16) remark: PARTIAL LOOP WAS VECTORIZED.
   (Line number points to the outermost of the three loops; there are also
    conditional jumps)
mp_prop_design.f90(818): (col. 16) remark: LOOP WAS VECTORIZED.
   (Nested "if" blocks)
mp_prop_design.f90(1032): (col. 2) remark: LOOP WAS VECTORIZED.
mp_prop_design.f90(1060): (col. 2) remark: LOOP WAS VECTORIZED.
   (The last two are handled by GCC)

Reply via email to