Hi Andrew, Thanks for pinging this. I've re-started the submission.
On 28 May 2017 at 08:01, Andrew Pinski <apin...@cavium.com> wrote: > On Tue, Feb 28, 2017 at 1:53 AM, Maxim Kuvyrkov > <maxim.kuvyr...@linaro.org> wrote: >>> On Feb 20, 2017, at 5:38 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> >>> wrote: >>> >>> Hi Maxim, >>> >>> On 30/01/17 11:24, Maxim Kuvyrkov wrote: >>>> This patch series improves -fprefetch-loop-arrays pass through small fixes >>>> and tweaks, and then enables it for several AArch64 cores. >>>> >>>> My tunings were done on and for Qualcomm hardware, with results varying >>>> between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, >>>> depending on hardware revision. >>>> >>>> This patch series enables restricted -fprefetch-loop-arrays at -O2, which >>>> also improves SPEC2006 numbers >>>> >>>> Biggest progressions are on 419.mcf and 437.leslie3d, with no serious >>>> regressions on other benchmarks. >>>> >>>> I'm now investigating making -fprefetch-loop-arrays more aggressive for >>>> Qualcomm hardware, which improves performance on most benchmarks, but also >>>> causes big regressions on 454.calculix and 462.libquantum. If I can fix >>>> these two regressions, prefetching will give another boost to AArch64. >>>> >>>> Andrew just posted similar prefetching tunings for Cavium's cores, and the >>>> two patches have trivial conflicts. I'll post mine as-is, since it >>>> address one of the comments on Andrew's review (adding a stand-alone >>>> struct for tuning parameters). >>>> >>>> Andrew, feel free to just copy-paste it to your patch, since it is just a >>>> mechanical change. >>>> >>>> All patches were bootstrapped and regtested on x86_64-linux-gnu and >>>> aarch64-linux-gnu. >>>> >>> >>> I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning >>> structs entries appropriately >>> modified to enable the changes on those cores. >>> I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and >>> Cortex-A53 and no noticeable regressions. >>> I've also verified that the improvements are due to the prefetch >>> instructions rather than just the unrolling that >>> the pass does. >>> So I'm in favor of enabling this for the cores that benefit from it. >>> >>> Do you plan to get this in for GCC 8? >> >> Hi Kyrill, >> >> My hope was to push them in time for GCC 7, but it seems to late now. I'll >> return to these patches at the beginning of Stage 1. > > Ping on this patch set as I really want to get in the prefetching side > for ThunderX 1 and 2. Or should I resubmit my patch set? > > Thanks, > Andrew > >> >> -- >> Maxim Kuvyrkov >> www.linaro.org >> -- Maxim Kuvyrkov www.linaro.org