On Fri, Oct 19, 2018 at 12:02:43PM +0100, Will Deacon wrote: > On Thu, Oct 18, 2018 at 08:53:42PM -0700, Alexei Starovoitov wrote: > > On Thu, Oct 18, 2018 at 09:00:46PM +0200, Daniel Borkmann wrote: > > > On 10/18/2018 05:33 PM, Alexei Starovoitov wrote: > > > > On Thu, Oct 18, 2018 at 05:04:34PM +0200, Daniel Borkmann wrote: > > > >> #endif /* _TOOLS_LINUX_ASM_IA64_BARRIER_H */ > > > >> diff --git a/tools/arch/powerpc/include/asm/barrier.h > > > >> b/tools/arch/powerpc/include/asm/barrier.h > > > >> index a634da0..905a2c6 100644 > > > >> --- a/tools/arch/powerpc/include/asm/barrier.h > > > >> +++ b/tools/arch/powerpc/include/asm/barrier.h > > > >> @@ -27,4 +27,20 @@ > > > >> #define rmb() __asm__ __volatile__ ("sync" : : : "memory") > > > >> #define wmb() __asm__ __volatile__ ("sync" : : : "memory") > > > >> > > > >> +#if defined(__powerpc64__) > > > >> +#define smp_lwsync() __asm__ __volatile__ ("lwsync" : : : "memory") > > > >> + > > > >> +#define smp_store_release(p, v) \ > > > >> +do { \ > > > >> + smp_lwsync(); \ > > > >> + WRITE_ONCE(*p, v); \ > > > >> +} while (0) > > > >> + > > > >> +#define smp_load_acquire(p) \ > > > >> +({ \ > > > >> + typeof(*p) ___p1 = READ_ONCE(*p); \ > > > >> + smp_lwsync(); \ > > > >> + ___p1; \ > > > > > > > > I don't like this proliferation of asm. > > > > Why do we think that we can do better job than compiler? > > > > can we please use gcc builtins instead? > > > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html > > > > __atomic_load_n(ptr, __ATOMIC_ACQUIRE); > > > > __atomic_store_n(ptr, val, __ATOMIC_RELEASE); > > > > are done specifically for this use case if I'm not mistaken. > > > > I think it pays to learn what compiler provides. > > > > > > But are you sure the C11 memory model matches exact same model as kernel? > > > Seems like last time Will looked into it [0] it wasn't the case ... > > > > I'm only suggesting equivalence of __atomic_load_n(ptr, __ATOMIC_ACQUIRE) > > with kernel's smp_load_acquire(). > > I've seen a bunch of user space ring buffer implementations implemented > > with __atomic_load_n() primitives. > > But let's ask experts who live in both worlds. > > One thing to be wary of is if there is an implementation choice between > how to implement load-acquire and store-release for a given architecture. > In these situations, it's often important that concurrent software agrees > on the "mapping", so we'd need to be sure that (a) All userspace compilers > that we care about have compatible mappings and (b) These mappings are > compatible with the kernel code.
Agreed! Mixing and matching can be done, but it does require quite a bit of care. Thanx, Paul