epilogue for modern CPUs

Xinliang David Li Tue, 11 Dec 2012 15:39:21 -0800

Some SPEC2k performance number (with 3 runs on core2):

Push wins over move on 3 benchmarks. Others are noises.


perlbmk : ~+1.9%
gap:       ~+1.4%
vortex:    ~ +0.7%

David

On Tue, Dec 11, 2012 at 2:53 PM, Xinliang David Li <[email protected]> wrote:
> The following the O2 size data from SPEC2k.  Note that with push/pop,
> it is a always a net win (negative delta) in terms of total binary or
> total loadable section size.
>
> thanks,
>
> David
>
>                    .text    .eh_frame  Total_binary
> vortex-move 440252 40796 584066
> vortex-push 415436 57452 575906
> delta            -5.6% 40.8%  -1.397%
>
> twolf-move 169324 10748 223521
> twolf-push 168876 11124 223449
> delta       -0.3% 3.5% -0.032%
>
> gzip-move 30668 3652 374399
> gzip-push 30524 3740 374343
> delta     -0.5% 2.4% -0.015%
>
> bzip2-move 22748 3196 111616
> bzip2-push 22636 3284 111592
> delta      -0.5% 2.8% -0.022%
>
> vpr-move 104684 9380 147378
> vpr-push 104236 9788 147338
> delta     -0.4% 4.3% -0.027%
>
> mcf-move 8444 1244 26760
> mcf-push 8444 1244 26760
> delta    0.0% 0.0% 0.000%
>
> cc1-move 1093964 90772 1576994
> cc1-push 1078988 104068 1575314
> delta      -1.4% 14.6% -0.107%
>
> crafty-move 130556 5508 1256037
> crafty-push 130236 5772 1255981
> delta        -0.2% 4.8% -0.004%
>
> eon-move 333660 33220 516491
> eon-push 330140 35812 515555
> delta     -1.1% 7.8% -0.181%
>
> gap-move 404092 46732 1457735
> gap-push 396012 53180 1456103
> delta     -2.0% 13.8% -0.112%
>
> perlbmk-move 456572 45324 618585
> perlbmk-push 449516 52340 618545
> delta         -1.5% 15.5% -0.006%
>
> parser-move 81244 15788 334003
> parser-push 80684 16332 333987
> delta       -0.7% 3.4% -0.005%
>
>
> On Tue, Dec 11, 2012 at 9:14 AM, Xinliang David Li <[email protected]> wrote:
>> On Tue, Dec 11, 2012 at 1:49 AM, Richard Biener
>> <[email protected]> wrote:
>>> On Mon, Dec 10, 2012 at 10:07 PM, Mike Stump <[email protected]> wrote:
>>>> On Dec 10, 2012, at 12:42 PM, Xinliang David Li <[email protected]> wrote:
>>>>> I have not measured the CFI size impact -- but conceivably it should
>>>>> be larger -- which is unfortunate.
>>>>
>>>> Code speed and size are preferable to optimizing dwarf size…  :-)  I'd let 
>>>> dwarf 5 fix it!
>>>
>>> Well, different to debug info, CFI data has to be in memory to make
>>> unwinding work.
>>> These days most Linux distributions enable asyncronous unwind tables so any
>>> size savings due to shorter push/pop epilogue/prologue sequences has to be
>>> offsetted by the increase in CFI data.  I'm not sure there is really a
>>> speed difference
>>> between both variants (well, maybe due to better icache footprint of
>>> the push/pop
>>> variant).
>>
>> Yes, for large applications, this can be crucial to performance.
>>
>>>
>>> That said - I'd prefer to have more data on this before making the switch 
>>> for
>>> the generic model.  What was your original motivation?  Just "theory" or was
>>> it a real case?
>>
>> 1) some of the very large internal apps I measured benefit from this
>> change (in terms of performance)
>> 2) both ICC and LLVM do the same.
>>
>> I have already committed the patch. I will find some time to collect
>> more size data and post it later.
>>
>> thanks,
>>
>> David
>>
>>
>>>
>>> Thanks,
>>> Richard.

Re: [PATCH i386]: Enable push/pop in pro/epilogue for modern CPUs

Reply via email to