On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix) <felix.y...@huawei.com> wrote: > Hi, > > As discussed in LKML: > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, > the cost of changing a cache line > from shared to exclusive state can be significant on aarch64 cores, > especially when this is triggered by an exclusive store, since it may > result in having to retry the transaction. > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch > cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic > routines. > Bootstrapped on AArch64 server, is it OK?
I don't think this is a good thing in general. For an example on ThunderX, the prefetch just adds a cycle for no benefit. This really depends on the micro-architecture of the core and how LDXR/STXR are implemented. So after this patch, it will slow down ThunderX. Thanks, Andrew Pinski > > Thanks, > Felix