This means O3 level inlining should be turned on also for lto build by default -- as -O2 lto performance is too unimpressive.
David On Mon, Nov 15, 2010 at 3:36 PM, Xinliang David Li <davi...@google.com> wrote: > Just measured: lto +O3 improves over O2 by a decent 4.8% geomean. More > data come later. > > 164.gzip 1324 1322 -0.10% > 175.vpr 1694 1703 0.51% > 176.gcc 2293 2347 2.34% > 181.mcf 1772 1797 1.43% > 186.crafty 2320 2486 7.12% > 197.parser 1166 1236 6.02% > 252.eon 2443 2810 14.98% > 253.perlbmk 2410 2407 -0.16% > 254.gap 1987 2024 1.82% > 255.vortex 2392 2826 18.13% > 256.bzip2 1719 1760 2.38% > 300.twolf 2288 2394 4.63% > > > David > > > On Mon, Nov 15, 2010 at 2:38 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> I did some measurement (64bit). >>> >>> Experiment 1: >>> >>> -O2 -funroll-loops vs -O2 >>> >>> It improves performance (geomean) by 0.56%, not too much: >>> O2 O2 unroll-loops >>> 164.gzip 1324 1331 0.56% >>> 175.vpr 1694 1605 -5.24% >>> 176.gcc 2293 2350 2.47% >>> 181.mcf 1772 1788 0.90% >>> 186.crafty 2320 2326 0.26% >>> 197.parser 1166 1162 -0.32% >>> 252.eon 2443 2529 3.50% >>> 253.perlbmk 2410 2460 2.07% >>> 254.gap 1987 2019 1.58% >>> 255.vortex 2392 2406 0.58% >>> 256.bzip2 1719 1715 -0.25% >>> 300.twolf 2288 2308 0.88% >> >> Can you also try -funroll-all-loops? As for pretty small programs, like >> spec2k, -funroll-all-loops is often win. In just few loops we can work out >> number of iterations. >> >>> >>> Experiment 3: O2 lto vs O2: geomean 0.72% >>> O2 O2 LTO >>> 164.gzip 1324 1317 -0.53% >>> 175.vpr 1694 1697 0.18% >>> 176.gcc 2293 2291 -0.08% >>> 181.mcf 1772 1760 -0.65% >>> 186.crafty 2320 2245 -3.26% >>> 197.parser 1166 1163 -0.29% >>> 252.eon 2443 2576 5.44% >>> 253.perlbmk 2410 2433 0.93% >>> 254.gap 1987 1995 0.36% >>> 255.vortex 2392 2588 8.19% >>> 256.bzip2 1719 1729 0.56% >>> 300.twolf 2288 2248 -1.77% >> >> You need -O3 -fwhole-program -flto for resonable cross module inlining to >> happen. >> -fwhole-program is quite essential to get resonable win from LTO (w/o >> profile feedback). >> >> At least our nightly tester then gets quite nice improvements on few >> benchmark at spec2k, >> see also my gccsummit slides. >> >> Honza >> >