Thanks for having kernel/locking people on Cc...
On Wed, Jan 23, 2019 at 08:13:55PM -0800, Alexei Starovoitov wrote: > Implementation details: > - on !SMP bpf_spin_lock() becomes nop Because no BPF program is preemptible? I don't see any assertions or even a comment that says this code is non-preemptible. AFAICT some of the BPF_RUN_PROG things are under rcu_read_lock() only, which is not sufficient. > - on architectures that don't support queued_spin_lock trivial lock is used. > Note that arch_spin_lock cannot be used, since not all archs agree that > zero == unlocked and sizeof(arch_spinlock_t) != sizeof(__u32). I really don't much like direct usage of qspinlock; esp. not as a surprise. Why does it matter if 0 means unlocked; that's what __ARCH_SPIN_LOCK_UNLOCKED is for. I get the sizeof(__u32) thing, but why not key off of that? > Next steps: > - allow bpf_spin_lock in other map types (like cgroup local storage) > - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper > to request kernel to grab bpf_spin_lock before rewriting the value. > That will serialize access to map elements. So clearly this map stuff is shared between bpf proglets, otherwise there would not be a need for locking. But what happens if one is from task context and another from IRQ context? I don't see a local_irq_save()/restore() anywhere. What avoids the trivial lock inversion? > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index a74972b07e74..2e98e4caf5aa 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -221,6 +221,63 @@ const struct bpf_func_proto bpf_get_current_comm_proto = > { > .arg2_type = ARG_CONST_SIZE, > }; > > +#ifndef CONFIG_QUEUED_SPINLOCKS > +struct dumb_spin_lock { > + atomic_t val; > +}; > +#endif > + > +notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock) > +{ > +#if defined(CONFIG_SMP) > +#ifdef CONFIG_QUEUED_SPINLOCKS > + struct qspinlock *qlock = (void *)lock; > + > + BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock)); > + queued_spin_lock(qlock); > +#else > + struct dumb_spin_lock *qlock = (void *)lock; > + > + BUILD_BUG_ON(sizeof(*qlock) != sizeof(*lock)); > + do { > + while (atomic_read(&qlock->val) != 0) > + cpu_relax(); > + } while (atomic_cmpxchg(&qlock->val, 0, 1) != 0); > +#endif > +#endif > + return 0; > +} > + > +const struct bpf_func_proto bpf_spin_lock_proto = { > + .func = bpf_spin_lock, > + .gpl_only = false, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_SPIN_LOCK, > +}; > + > +notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock) > +{ > +#if defined(CONFIG_SMP) > +#ifdef CONFIG_QUEUED_SPINLOCKS > + struct qspinlock *qlock = (void *)lock; > + > + queued_spin_unlock(qlock); > +#else > + struct dumb_spin_lock *qlock = (void *)lock; > + > + atomic_set(&qlock->val, 0); And this is broken... That should've been atomic_set_release() at the very least. And this would again be the moment where I go pester you about the BPF memory model :-) > +#endif > +#endif > + return 0; > +} > + > +const struct bpf_func_proto bpf_spin_unlock_proto = { > + .func = bpf_spin_unlock, > + .gpl_only = false, > + .ret_type = RET_VOID, > + .arg1_type = ARG_PTR_TO_SPIN_LOCK, > +};