Hi, thanks for the report! It is actually more promising than I've expected. A while ago I did similar tests with whole-program and --combine and we didn't get very consistent with performance (I saw also code size reductions). I guess geomaverage will go down for specint after vpr/gcc/perlbmk/gap works since pretty much everything comes from EON's intermodule inlining. I've just comitted the patch to fix ipa-sra problem that will hopefully allow clean SPEC runs.
The ipa-sra bug chance calling convention of externally visible functions. It should not affect size too much. > > > > With latest Jan's fixes, The results (for -O3 vs -O3 -flto > > -fwhole-program) are > > > > x86: > > o Int2000: > > - LTO crashes the compiler on vortex. LTO generates > > wrong code for vpr, gcc, perlbmk, and gap. > > - Compiler is 1.85 times slower with LTO > > - Average code size is almost 6% smaller: > > > > 4.615% 44287 46331 164.gzip > > -3.145% 144101 139569 175.vpr > > 0.261% 1566926 1571009 176.gcc > > -12.118% 12279 10791 181.mcf > > 11.130% 209956 233324 186.crafty > > -29.735% 155358 109162 197.parser > > -23.075% 497347 382585 252.eon > > 8.904% 552163 601327 253.perlbmk > > 1.516% 503006 510630 254.gap > > -20.891% 47465 37549 256.bzip2 > > -3.047% 198365 192321 300.twolf > > Average = -5.96236% > > > > - Performance is improved almost by 4% > > > > 164.gzip 1668 1629 -2.33813% > > 181.mcf 5011 5020 0.17960% > > 186.crafty 2268 2277 0.39682% > > 197.parser 1928 1925 -0.15560% There is simple opurtunity for improvement at parser for whole program optimization. The hashtable size is held in static variable and it is constant prime (after it gets initialized at startup of benchmark). Being able to constant propagate this would noticeably help here. > > 252.eon 2477 2950 19.0957% > > 256.bzip2 1894 1956 3.2735% > > 300.twolf 2806 3026 7.84034% > > GeoMean 2416 2509 3.84934% > > > > > > LTO is quite promising. Actually it is in line or even better with > > improvement got from other compilers (pathscale is the most convenient > > compiler to check lto separately: lto gave there upto 5% improvement > > on SPECFP2000 and 3.5% for SPECInt2000 making compiler about 50% > > slower and generated code size upto 30% bigger). LTO in GCC actually I must say that I expect the geomaverage go down after we fix the broken benchmarks, but I would be happy to be wrong. I wonder how pathscale makes to make code size so much bigger with whole program assumptions. Isn't this comparsion of single file compilation compared to pathscale equivalent of -flto alone? (i.e. not -flto -fwhole-program?). The results also imply that on large units we probably still do quite bad. Doing more clonning and less inlining should help here I would guess. Do you happen to have comparsion of -flto to -flto -fwhole-program? Honza