Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More data come later.
164.gzip 1324 1322 -0.10% 175.vpr 1694 1703 0.51% 176.gcc 2293 2347 2.34% 181.mcf 1772 1797 1.43% 186.crafty 2320 2486 7.12% 197.parser 1166 1236 6.02% 252.eon 2443 2810 14.98% 253.perlbmk 2410 2407 -0.16% 254.gap 1987 2024 1.82% 255.vortex 2392 2826 18.13% 256.bzip2 1719 1760 2.38% 300.twolf 2288 2394 4.63% David On Mon, Nov 15, 2010 at 2:38 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >> I did some measurement (64bit). >> >> Experiment 1: >> >> -O2 -funroll-loops vs -O2 >> >> It improves performance (geomean) by 0.56%, not too much: >> O2 O2 unroll-loops >> 164.gzip 1324 1331 0.56% >> 175.vpr 1694 1605 -5.24% >> 176.gcc 2293 2350 2.47% >> 181.mcf 1772 1788 0.90% >> 186.crafty 2320 2326 0.26% >> 197.parser 1166 1162 -0.32% >> 252.eon 2443 2529 3.50% >> 253.perlbmk 2410 2460 2.07% >> 254.gap 1987 2019 1.58% >> 255.vortex 2392 2406 0.58% >> 256.bzip2 1719 1715 -0.25% >> 300.twolf 2288 2308 0.88% > > Can you also try -funroll-all-loops? As for pretty small programs, like > spec2k, -funroll-all-loops is often win. In just few loops we can work out > number of iterations. > >> >> Experiment 3: O2 lto vs O2: geomean 0.72% >> O2 O2 LTO >> 164.gzip 1324 1317 -0.53% >> 175.vpr 1694 1697 0.18% >> 176.gcc 2293 2291 -0.08% >> 181.mcf 1772 1760 -0.65% >> 186.crafty 2320 2245 -3.26% >> 197.parser 1166 1163 -0.29% >> 252.eon 2443 2576 5.44% >> 253.perlbmk 2410 2433 0.93% >> 254.gap 1987 1995 0.36% >> 255.vortex 2392 2588 8.19% >> 256.bzip2 1719 1729 0.56% >> 300.twolf 2288 2248 -1.77% > > You need -O3 -fwhole-program -flto for resonable cross module inlining to > happen. > -fwhole-program is quite essential to get resonable win from LTO (w/o profile > feedback). > > At least our nightly tester then gets quite nice improvements on few > benchmark at spec2k, > see also my gccsummit slides. > > Honza >