Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

Evgeny Stupachenko Wed, 12 Nov 2014 06:03:08 -0800

Code size for spec2000 is almost unchanged (many benchmarks have the
same binaries).
For those that are changed we have the following numbers (200 vs 100,
both dynamic build -Ofast -funroll-loops -flto):
183.equake +10%
164.gzip, 173.applu +3,5%
187.facerec, 191.fma3d +2,5%
200.sixstrack +2%
177.mesa, 178.galgel +1%



On Wed, Nov 12, 2014 at 2:51 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
>> > 150 and 200 make Silvermont performance better on 173.applu (+8%) and
>> > 183.equake (+3%); Haswell spec2006 performance stays almost unchanged.
>> > Higher value of 300 leave the performance of mentioned tests
>> > unchanged, but add some regressions on other benchmarks.
>> >
>> > So I like 200 as well as 120 and 150, but can confirm performance
>> > gains only for x86.
>>
>> IMO it's either 150 or 200.  We chose 200 for our 4.9-based compiler because
>> this gave the performance boost without affecting the code size (on x86-64)
>> and because this was previously 400, but it's your call.
>
> Both 150 or 200 globally work for me if there is not too much of code size
> bloat (did not see code size mentioned here).
>
> What I did before decreasing the bounds was strenghtening the loop iteraton
> count bounds and adding logic the predicts constant propagation enabled by
> unrolling. For this reason 400 became too large as we did a lot more complete
> unrolling than before. Also 400 in older compilers is not really 400 in newer.
>
> Because I saw performance to drop only with values bellow 50, I went for 100.
> It would be very interesting to actually analyze what happends for those two
> benchmarks (that should not be too hard with perf).
>
> Honza

Re: [PATCH x86] Increase PARAM_MAX_COMPLETELY_PEELED_INSNS when branch is costly

Reply via email to