> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.y...@huawei.com> wrote: > > Hi, > > > > As discussed in LKML: > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, > the > cost of changing a cache line > > from shared to exclusive state can be significant on aarch64 cores, > especially when this is triggered by an exclusive store, since it may > > result in having to retry the transaction. > > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch > cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic > routines. > > Bootstrapped on AArch64 server, is it OK? > > > I don't think this is a good thing in general. For an example on ThunderX, > the > prefetch just adds a cycle for no benefit. This really depends on the > micro-architecture of the core and how LDXR/STXR are > implemented. So after this patch, it will slow down ThunderX. > > Thanks, > Andrew Pinski >
Hi Andrew, I am not quite clear about the ThunderX micro-arch. But, Yes, I agree it depends on the micro-architecture of the core. As the mentioned kernel patch is merged upstream, I think the added prefetch instruction in atomic routines is good for most of AArch64 cores in the market. If it does nothing good for ThunderX, then how about adding some checking here? I mean disabling the the generation of the prfm if we are tuning for ThunderX. Thanks, Felix