On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix) <felix.y...@huawei.com> wrote: >> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.y...@huawei.com> >> wrote: >> > Hi, >> > >> > As discussed in LKML: >> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, >> the >> cost of changing a cache line >> > from shared to exclusive state can be significant on aarch64 cores, >> especially when this is triggered by an exclusive store, since it may >> > result in having to retry the transaction. >> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch >> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic >> routines. >> > Bootstrapped on AArch64 server, is it OK? >> >> >> I don't think this is a good thing in general. For an example on ThunderX, >> the >> prefetch just adds a cycle for no benefit. This really depends on the >> micro-architecture of the core and how LDXR/STXR are >> implemented. So after this patch, it will slow down ThunderX. >> >> Thanks, >> Andrew Pinski >> > > Hi Andrew, > > I am not quite clear about the ThunderX micro-arch. But, Yes, I agree it > depends on the micro-architecture of the core. > As the mentioned kernel patch is merged upstream, I think the added > prefetch instruction in atomic routines is good for most of AArch64 cores in > the market. > If it does nothing good for ThunderX, then how about adding some checking > here? I mean disabling the the generation of the prfm if we are tuning for > ThunderX.
No it is not just not do any good, it actually causes worse performance for ThunderX. How about only doing it for the micro-architecture where it helps and also not do it for generic since it hurts ThunderX so much. Thanks, Andrew > > Thanks, > Felix