On 11/20/18 10:53 AM, Kyrill Tkachov wrote:
> On 20/11/18 16:48, Pat Haugen wrote:
>> On 11/19/18 2:30 PM, Pat Haugen wrote:
>>>> This is a follow-up from 
>>>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html
>>>> This version introduces an "artificial" property of the dependencies 
>>>> produced in
>>>> sched-deps.c that is recorded when they are created due to 
>>>> MAX_PENDING_LIST_LENGTH
>>>> and they are thus ignored in the model_analyze_insns ALAP calculation.
>>>>
>>>> This approach gives most of the benefits of the original patch [1] on 
>>>> aarch64.
>>>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on 
>>>> powerpc64le-unknown-linux-gnu
>>>> with -O3 and found that the initial version proposed did indeed increase 
>>>> the instruction count
>>>> and stack space. This version gives a small improvement on powerpc in 
>>>> terms of instruction count
>>>> (number of st* instructions stays the same), so I'm hoping this version 
>>>> addresses Pat's concerns.
>>>> Pat, could you please try this version out if you've got the chance?
>>>>
>>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've 
>>> kicked off a full CPU2006 run just to see if any others are affected.
>> The other benchmarks were neutral. So the only benchmark showing a change is 
>> the 2% degradation on cactusADM. Comparing the generated .s files for 
>> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns 
>> and still the 1% increase in store insns.
> 
> Sigh :(
> What options are you compiling with? I tried a powerpc64le compiler with 
> plain -O3 and saw got a slight improvement (by manual expection)

I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops 
-ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see 
just under a 1% degradation.

-Pat

Reply via email to