Re: LTO: Speedup -- some preliminary SPEC2000 results

Vladimir Makarov Wed, 07 Oct 2009 07:25:34 -0700

Jan Hubicka wrote:

So things seems to work now plus minus as expected.  I.e. LTO builds
seems similar to combined builds and whole-programs improves code size
quite noticeably.
Runtime results for gzip are pretty much unchanged, but that is
expected.  I am quite curoius about full SPEC run.

Before the fix (Jan's two latest patches), the lto results were
disappointed.  In brief the results I checked SPEC2000 a week ago on
lto branch LTO on Core I7  (-O3 vs -O3 -flto with optional
-fwhole-program) were
 o Usage of LTO made compiler 1.9 time slower (in cpu time) for
   SPECInt2000 and 2.2 time for SPECFP2000 on x86 and x86_64.

o LTO generated 16-17% bigger code for Int2000 and 13-14% bigger forFP2000.

 o There is 0.6% improvement for SPECFP2000 on x86 and 1% for
   SPECInt2000 on x86_64 (only because of 20% improvement on vortex,
   all other tests were actually worse than without LTO).
 o No improvement for Int2000 on x86 and FP2000 on x86_64.
 o 252.eon and 176.gcc crash compiler when LTO were used.

With latest Jan's fixes, The results (for -O3 vs -O3 -flto
-fwhole-program) are

x86:
 o Int2000:
   - LTO crashes the compiler on vortex.  LTO generates
     wrong code for vpr, gcc, perlbmk, and gap.
   - Compiler is 1.85 times slower with LTO
   - Average code size is almost 6% smaller:

        4.615%          44287          46331 164.gzip
       -3.145%         144101         139569 175.vpr
        0.261%        1566926        1571009 176.gcc
      -12.118%          12279          10791 181.mcf
       11.130%         209956         233324 186.crafty
      -29.735%         155358         109162 197.parser
      -23.075%         497347         382585 252.eon
        8.904%         552163         601327 253.perlbmk
        1.516%         503006         510630 254.gap
      -20.891%          47465          37549 256.bzip2
       -3.047%         198365         192321 300.twolf
       Average = -5.96236%

    - Performance is improved almost by 4%

      164.gzip    1668   1629  -2.33813%
      181.mcf     5011   5020   0.17960%
      186.crafty  2268   2277   0.39682%
      197.parser  1928   1925  -0.15560%
      252.eon     2477   2950  19.0957%
      256.bzip2   1894   1956   3.2735%
      300.twolf   2806   3026   7.84034%
      GeoMean     2416   2509   3.84934%

 o FP2000
   - LTO generates wrong code for mgrid, applu, galgel, facerec,
     fm3d, sxitrack, and apsi.
   - Compiler is 2.1 times slower with LTO
   - Average code size is almost 1.7% smaller:

      -8.771%          27544          25128 168.wupwise
       2.328%           9108           9320 171.swim
       2.127%          18193          18580 172.mgrid
       0.004%          76584          76587 173.applu
      -5.938%         576270         542049 177.mesa
      -2.046%         183667         179910 178.galgel
     -10.635%          15881          14192 179.art
     -16.292%          28812          24118 183.equake
      -3.177%          67239          65103 187.facerec
      10.989%         125273         139039 188.ammp
      -0.735%          49137          48776 189.lucas
      -0.856%        1144550        1134756 191.fma3d
      11.457%         935941        1043168 200.sixtrack
      Average = -1.65735%

    - Performance is improved almost by 6%

      168.wupwise    2349    3266  39.0379%
      171.swim       3511    3529   0.51267%
      177.mesa       1970    2008   1.92893%
      179.art        7097    7293   2.76173%
      183.equake     3844    4138   7.64828%
      188.ammp       2423    2401  -0.90796%
      189.lucas      2825    2718  -3.78761%
      GeoMean        3144    3332   5.97964%

x86_64:

 o Int2000:
   - LTO crashes the compiler on gcc.  LTO generates
     wrong code for vpr, perlbmk, gap, and vortex
   - Compiler is 1.8 times slower with LTO
   - Average code size is more than 8% smaller:

        1.376%          49119          49795 164.gzip
       -4.348%         158389         151503 175.vpr
      -16.964%          14949          12413 181.mcf
       12.875%         195234         220370 186.crafty
      -29.519%         180780         127416 197.parser
      -22.894%         521614         402197 252.eon
        9.507%         645749         707141 253.perlbmk
        6.550%         585164         623492 254.gap
      -22.493%         660414         511866 255.vortex
      -18.343%          55825          45585 256.bzip2
       -5.295%         212727         201463 300.twolf
      Average = -8.14068%

    - Performance is improved by 2.1%

      164.gzip     1804    1773  -1.7184%
      181.mcf      3480    3460  -0.5747%
      186.crafty   3397    3406   0.2649%
      197.parser   1847    1803  -2.3822%
      252.eon      4071    4537  11.4468%
      256.bzip2    2197    2249   2.3668%
      300.twolf    2878    3048   5.9068%
      GeoMean      2688    2744   2.0833%

 o FP2000
   - LTO crashes the compiler on apsi.  LTO generates wrong code for
     mgrid, applu, galgel, facerec, fm3d, sixtrack.
   - Compiler is 2.1 times slower with LTO
   - Average code size is 2.7% smaller:

      27.674%          33902          43284 168.wupwise
      -3.107%          15704          15216 171.swim
      -0.685%          22929          22772 172.mgrid
      -1.167%         103280         102075 173.applu
      -8.346%         678724         622079 177.mesa
      -4.304%         249773         239024 178.galgel
     -25.801%          20375          15118 179.art
     -28.805%          37514          26708 183.equake
      -1.577%          76837          75625 187.facerec
       1.570%         168235         170877 188.ammp
      -1.168%          57271          56602 189.lucas
      -0.940%        1276316        1264314 191.fma3d
      10.949%        1106507        1227658 200.sixtrack
     Average = -2.74672%

    - Performance is improved almost by 6%

      168.wupwise     2532   3708  46.4455%
      171.swim        3740   3729  -0.2941%
      177.mesa        2969   2946  -0.7746%
      179.art         7278   7092  -2.5556%
      183.equake      3978   4227   6.2594%
      188.ammp        2490   2515   1.0040%
      189.lucas       3886   3806  -2.0586%
      GeoMean         3603   3812   5.8007%

LTO is quite promising.  Actually it is in line or even better with
improvement got from other compilers (pathscale is the most convenient
compiler to check lto separately: lto gave there upto 5% improvement
on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50%
slower and generated code size upto 30% bigger).  LTO in GCC actually
results in significant code reduction which is quite different from
pathscale.  That is one of rare cases on my mind when a specific
optimization works actually better in gcc than in other optimizing
compilers.  So congratulation to all people who worked on LTO!

I think the biggest winner of LTO will be big C++ programs (eon shows
that).  Additional optimizations (like devirtualization) could improve
that results even more.  I think the next big thing would be using
subtarget-specialized functions.

Re: LTO: Speedup -- some preliminary SPEC2000 results

Reply via email to