Hi Andrew,

Thanks for pinging this.  I've re-started the submission.

On 28 May 2017 at 08:01, Andrew Pinski <apin...@cavium.com> wrote:
> On Tue, Feb 28, 2017 at 1:53 AM, Maxim Kuvyrkov
> <maxim.kuvyr...@linaro.org> wrote:
>>> On Feb 20, 2017, at 5:38 PM, Kyrill Tkachov <kyrylo.tkac...@foss.arm.com> 
>>> wrote:
>>>
>>> Hi Maxim,
>>>
>>> On 30/01/17 11:24, Maxim Kuvyrkov wrote:
>>>> This patch series improves -fprefetch-loop-arrays pass through small fixes 
>>>> and tweaks, and then enables it for several AArch64 cores.
>>>>
>>>> My tunings were done on and for Qualcomm hardware, with results varying 
>>>> between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, 
>>>> depending on hardware revision.
>>>>
>>>> This patch series enables restricted -fprefetch-loop-arrays at -O2, which 
>>>> also improves SPEC2006 numbers
>>>>
>>>> Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
>>>> regressions on other benchmarks.
>>>>
>>>> I'm now investigating making -fprefetch-loop-arrays more aggressive for 
>>>> Qualcomm hardware, which improves performance on most benchmarks, but also 
>>>> causes big regressions on 454.calculix and 462.libquantum.  If I can fix 
>>>> these two regressions, prefetching will give another boost to AArch64.
>>>>
>>>> Andrew just posted similar prefetching tunings for Cavium's cores, and the 
>>>> two patches have trivial conflicts.  I'll post mine as-is, since it 
>>>> address one of the comments on Andrew's review (adding a stand-alone 
>>>> struct for tuning parameters).
>>>>
>>>> Andrew, feel free to just copy-paste it to your patch, since it is just a 
>>>> mechanical change.
>>>>
>>>> All patches were bootstrapped and regtested on x86_64-linux-gnu and 
>>>> aarch64-linux-gnu.
>>>>
>>>
>>> I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning 
>>> structs entries appropriately
>>> modified to enable the changes on those cores.
>>> I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and 
>>> Cortex-A53 and no noticeable regressions.
>>> I've also verified that the improvements are due to the prefetch 
>>> instructions rather than just the unrolling that
>>> the pass does.
>>> So I'm in favor of enabling this for the cores that benefit from it.
>>>
>>> Do you plan to get this in for GCC 8?
>>
>> Hi Kyrill,
>>
>> My hope was to push them in time for GCC 7, but it seems to late now.  I'll 
>> return to these patches at the beginning of Stage 1.
>
> Ping on this patch set as I really want to get in the prefetching side
> for ThunderX 1 and 2.  Or should I resubmit my patch set?
>
> Thanks,
> Andrew
>
>>
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>>



-- 
Maxim Kuvyrkov
www.linaro.org

Reply via email to