On 11/20/18 10:53 AM, Kyrill Tkachov wrote: > On 20/11/18 16:48, Pat Haugen wrote: >> On 11/19/18 2:30 PM, Pat Haugen wrote: >>>> This is a follow-up from >>>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01525.html >>>> This version introduces an "artificial" property of the dependencies >>>> produced in >>>> sched-deps.c that is recorded when they are created due to >>>> MAX_PENDING_LIST_LENGTH >>>> and they are thus ignored in the model_analyze_insns ALAP calculation. >>>> >>>> This approach gives most of the benefits of the original patch [1] on >>>> aarch64. >>>> I tried it on the cactusADM hot function (bench_staggeredleapfrog2_) on >>>> powerpc64le-unknown-linux-gnu >>>> with -O3 and found that the initial version proposed did indeed increase >>>> the instruction count >>>> and stack space. This version gives a small improvement on powerpc in >>>> terms of instruction count >>>> (number of st* instructions stays the same), so I'm hoping this version >>>> addresses Pat's concerns. >>>> Pat, could you please try this version out if you've got the chance? >>>> >>> I tried the new verison on cactusADM, it's showing a 2% degradation. I've >>> kicked off a full CPU2006 run just to see if any others are affected. >> The other benchmarks were neutral. So the only benchmark showing a change is >> the 2% degradation on cactusADM. Comparing the generated .s files for >> bench_staggeredleapfrog2_(), there is about a 0.7% increase in load insns >> and still the 1% increase in store insns. > > Sigh :( > What options are you compiling with? I tried a powerpc64le compiler with > plain -O3 and saw got a slight improvement (by manual expection)
I was using the following: -O3 -mcpu=power8 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip=all. When I run with just -O3 -mcpu=power8 I see just under a 1% degradation. -Pat