Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Jan Hubicka Mon, 15 Nov 2010 14:38:28 -0800

> I did some measurement (64bit).
> 
> Experiment 1:
> 
> -O2 -funroll-loops vs -O2
> 
> It improves performance (geomean) by 0.56%, not too much:
>                                          O2                 O2 unroll-loops
>             164.gzip                1324                1331      0.56%
>              175.vpr                1694                1605     -5.24%
>              176.gcc                2293                2350      2.47%
>              181.mcf                1772                1788      0.90%
>           186.crafty                2320                2326      0.26%
>           197.parser                1166                1162     -0.32%
>              252.eon                2443                2529      3.50%
>          253.perlbmk                2410                2460      2.07%
>              254.gap                1987                2019      1.58%
>           255.vortex                2392                2406      0.58%
>            256.bzip2                1719                1715     -0.25%
>            300.twolf                2288                2308      0.88%


Can you also try -funroll-all-loops?  As for pretty small programs, like
spec2k, -funroll-all-loops is often win.  In just few loops we can work out
number of iterations.

> 
> Experiment 3:    O2 lto vs O2:    geomean 0.72%
>                                         O2                   O2 LTO
>            164.gzip                1324                1317     -0.53%
>              175.vpr                1694                1697      0.18%
>              176.gcc                2293                2291     -0.08%
>              181.mcf                1772                1760     -0.65%
>           186.crafty                2320                2245     -3.26%
>           197.parser                1166                1163     -0.29%
>              252.eon                2443                2576      5.44%
>          253.perlbmk                2410                2433      0.93%
>              254.gap                1987                1995      0.36%
>           255.vortex                2392                2588      8.19%
>            256.bzip2                1719                1729      0.56%
>            300.twolf                2288                2248     -1.77%

You need -O3 -fwhole-program -flto for resonable cross module inlining to 
happen.
-fwhole-program is quite essential to get resonable win from LTO (w/o profile 
feedback).

At least our nightly tester then gets quite nice improvements on few benchmark 
at spec2k,
see also my gccsummit slides.

Honza

Re: GCC-4.5.0 comparison with previous releases and LLVM-2.7 on SPEC2000 for x86/x86_64

Reply via email to