On Wed, Mar 09, 2016 at 06:52:45PM +0530, Vineet Gupta wrote:
> On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote:
> >> There is clearly a problem in slub code that it is pairing a 
> >> test_and_set_bit()
> >> with a __clear_bit(). Latter can obviously clobber former if they are not 
> >> a single
> >> instruction each unlike x86 or they use llock/scond kind of instructions 
> >> where the
> >> interim store from other core is detected and causes a retry of whole 
> >> llock/scond
> >> sequence.
> > 
> > Yes, test_and_set_bit() + __clear_bit() is broken.
> 
> But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so
> (ignoring the performance thing for discussion sake, which is a side effect of
> this implementation).

The sort answer is: Per definition. They are defined to work together,
which is what makes __clear_bit_unlock() such a special function.

> So despite the comment below in bit_spinlock.h I don't quite comprehend how 
> this
> is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed
> cases, then isn't this true in general for lot more cases in kernel, i.e. 
> pairing
> atomic lock with non-atomic unlock ? I'm missing something !

x86 (and others) do in fact use non-atomic instructions for
spin_unlock(). But as this is all arch specific, we can make these
assumptions. Its just that generic code cannot rely on it.

So let me try and explain.


The problem as identified is:

CPU0                                            CPU1

bit_spin_lock()                                 __bit_spin_unlock()
1:
        /* fetch_or, r1 holds the old value */
        spin_lock
        load    r1, addr
                                                load    r1, addr
                                                bclr    r2, r1, 1
                                                store   r2, addr
        or      r2, r1, 1
        store   r2, addr        /* lost the store from CPU1 */
        spin_unlock

        and     r1, 1
        bnz     2       /* it was set, go wait */
        ret

2:
        load    r1, addr
        and     r1, 1
        bnz     2       /* wait until its not set */

        b       1       /* try again */



For LL/SC we replace:

        spin_lock
        load    r1, addr

        ...

        store   r2, addr
        spin_unlock

With the (obvious):

1:
        load-locked     r1, addr

        ...

        store-cond      r2, addr
        bnz             1 /* or whatever branch instruction is required to 
retry */


In this case the failure cannot happen, because the store from CPU1
would have invalidated the lock from CPU0 and caused the
store-cond to fail and retry the loop, observing the new value.



_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Reply via email to