On Mon, Jan 30, 2017 at 3:24 AM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote: > This patch series improves -fprefetch-loop-arrays pass through small fixes > and tweaks, and then enables it for several AArch64 cores. > > My tunings were done on and for Qualcomm hardware, with results varying > between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, > depending on hardware revision. > > This patch series enables restricted -fprefetch-loop-arrays at -O2, which > also improves SPEC2006 numbers > > Biggest progressions are on 419.mcf and 437.leslie3d, with no serious > regressions on other benchmarks. > > I'm now investigating making -fprefetch-loop-arrays more aggressive for > Qualcomm hardware, which improves performance on most benchmarks, but also > causes big regressions on 454.calculix and 462.libquantum. If I can fix > these two regressions, prefetching will give another boost to AArch64.
I have a patch which causes more aggressively already which improves libquantum for CN88xx; I have not submitted yet as I had just restarted the upstreaming my patch sets. Thanks, Andrew > > Andrew just posted similar prefetching tunings for Cavium's cores, and the > two patches have trivial conflicts. I'll post mine as-is, since it address > one of the comments on Andrew's review (adding a stand-alone struct for > tuning parameters). > > Andrew, feel free to just copy-paste it to your patch, since it is just a > mechanical change. > > All patches were bootstrapped and regtested on x86_64-linux-gnu and > aarch64-linux-gnu. > > -- > Maxim Kuvyrkov > www.linaro.org > > >