> I am agree with you. We should not forget embedded market where the > code size is more imortant but we should provide an option for market > on which Pathscale/Intel/Sun compilers are oriented. > > With this point of view, we have a lot of resources because gcc > generates the smallest code. E.g. In averages Intel compiler generates > 80% bigger code for SPECINT2000 with -O2 and 7 times bigger with -fast. > It generates even bigger code with -Os (code size optimizations) than > GCC for -O3. The same is about the compilation speed. GCC with -O3 is > much faster (30%-40%) than all mentioned compilers in peak performance > mode (-fast, -Ofast). So I think we have some resouces which we could > spend to improve the code performance for this market.
Actually with my experiments about --combine and -fwhole-program combination I was more easilly getting noticeable code size savings for "LTO" at -O2 than noticeale performance improvements at -O3 (I didn't get even that 4% you mention on pathscale, it was about 1-2% if I remember correctly with about 20% code size regression). Just few of SPEC2000 benchmarks has important crossmodule inlining oppurtunities in them). I will try to dig out the numbers. So the initial LTO will probably be good for people caring about code size: not just the embedded marked, but I hope that as soon as we get --lto resonably working for C good portion of Linux distro can be built this way with noticeable benefits at load times of all the small tools we use daily. With code/data segment reordering we ought to get less memory dirtified at during initialization of compiled application too. It is hoped that as we implement better IPA alias analysis and similar optimizations, the need for inlining of large function bodies will be actually reduced. IP RA fits pretty well this plan IMO. Honza