Some SPEC2k performance number (with 3 runs on core2): Push wins over move on 3 benchmarks. Others are noises.
perlbmk : ~+1.9% gap: ~+1.4% vortex: ~ +0.7% David On Tue, Dec 11, 2012 at 2:53 PM, Xinliang David Li <davi...@google.com> wrote: > The following the O2 size data from SPEC2k. Note that with push/pop, > it is a always a net win (negative delta) in terms of total binary or > total loadable section size. > > thanks, > > David > > .text .eh_frame Total_binary > vortex-move 440252 40796 584066 > vortex-push 415436 57452 575906 > delta -5.6% 40.8% -1.397% > > twolf-move 169324 10748 223521 > twolf-push 168876 11124 223449 > delta -0.3% 3.5% -0.032% > > gzip-move 30668 3652 374399 > gzip-push 30524 3740 374343 > delta -0.5% 2.4% -0.015% > > bzip2-move 22748 3196 111616 > bzip2-push 22636 3284 111592 > delta -0.5% 2.8% -0.022% > > vpr-move 104684 9380 147378 > vpr-push 104236 9788 147338 > delta -0.4% 4.3% -0.027% > > mcf-move 8444 1244 26760 > mcf-push 8444 1244 26760 > delta 0.0% 0.0% 0.000% > > cc1-move 1093964 90772 1576994 > cc1-push 1078988 104068 1575314 > delta -1.4% 14.6% -0.107% > > crafty-move 130556 5508 1256037 > crafty-push 130236 5772 1255981 > delta -0.2% 4.8% -0.004% > > eon-move 333660 33220 516491 > eon-push 330140 35812 515555 > delta -1.1% 7.8% -0.181% > > gap-move 404092 46732 1457735 > gap-push 396012 53180 1456103 > delta -2.0% 13.8% -0.112% > > perlbmk-move 456572 45324 618585 > perlbmk-push 449516 52340 618545 > delta -1.5% 15.5% -0.006% > > parser-move 81244 15788 334003 > parser-push 80684 16332 333987 > delta -0.7% 3.4% -0.005% > > > On Tue, Dec 11, 2012 at 9:14 AM, Xinliang David Li <davi...@google.com> wrote: >> On Tue, Dec 11, 2012 at 1:49 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Dec 10, 2012 at 10:07 PM, Mike Stump <mikest...@comcast.net> wrote: >>>> On Dec 10, 2012, at 12:42 PM, Xinliang David Li <davi...@google.com> wrote: >>>>> I have not measured the CFI size impact -- but conceivably it should >>>>> be larger -- which is unfortunate. >>>> >>>> Code speed and size are preferable to optimizing dwarf size… :-) I'd let >>>> dwarf 5 fix it! >>> >>> Well, different to debug info, CFI data has to be in memory to make >>> unwinding work. >>> These days most Linux distributions enable asyncronous unwind tables so any >>> size savings due to shorter push/pop epilogue/prologue sequences has to be >>> offsetted by the increase in CFI data. I'm not sure there is really a >>> speed difference >>> between both variants (well, maybe due to better icache footprint of >>> the push/pop >>> variant). >> >> Yes, for large applications, this can be crucial to performance. >> >>> >>> That said - I'd prefer to have more data on this before making the switch >>> for >>> the generic model. What was your original motivation? Just "theory" or was >>> it a real case? >> >> 1) some of the very large internal apps I measured benefit from this >> change (in terms of performance) >> 2) both ICC and LLVM do the same. >> >> I have already committed the patch. I will find some time to collect >> more size data and post it later. >> >> thanks, >> >> David >> >> >>> >>> Thanks, >>> Richard.