On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.y...@huawei.com> wrote:
>> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.y...@huawei.com> 
>> wrote:
>> > Hi,
>> >
>> >     As discussed in LKML:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, 
>> the
>> cost of changing a cache line
>> >     from shared to exclusive state can be significant on aarch64 cores,
>> especially when this is triggered by an exclusive store, since it may
>> >     result in having to retry the transaction.
>> >     This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
>> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
>> routines.
>> >     Bootstrapped on AArch64 server, is it OK?
>>
>>
>> I don't think this is a good thing in general.  For an example on ThunderX, 
>> the
>> prefetch just adds a cycle for no benefit.  This really depends on the
>> micro-architecture of the core and how LDXR/STXR are
>> implemented.   So after this patch, it will slow down ThunderX.
>>
>> Thanks,
>> Andrew Pinski
>>
>
> Hi Andrew,
>
>    I am not quite clear about the ThunderX micro-arch.  But, Yes, I agree it 
> depends on the micro-architecture of the core.
>    As the mentioned kernel patch is merged upstream, I think the added 
> prefetch instruction in atomic routines is good for most of AArch64 cores in 
> the market.
>    If it does nothing good for ThunderX, then how about adding some checking 
> here?  I mean disabling the the generation of the prfm if we are tuning for 
> ThunderX.

No it is not just not do any good, it actually causes worse
performance for ThunderX.  How about only doing it for the
micro-architecture where it helps and also not do it for generic since
it hurts ThunderX so much.

Thanks,
Andrew

>
> Thanks,
> Felix

Reply via email to