Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wed, Mar 09, 2016 at 12:13:16PM +0530, Vineet Gupta wrote: > +CC linux-arch, parisc folks, PeterZ > > On Wednesday 09 March 2016 02:10 AM, Christoph Lameter wrote: > > On Tue, 8 Mar 2016, Vineet Gupta wrote: > > > >> # set the bit > >> 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set > >> 80543b90: or r3,r2,1<--- (B) other core unlocks right here > >> 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites > >> unlock) > > > > Duh. Guess you need to take the spinlock also in the arch specific > > implementation of __bit_spin_unlock(). This is certainly not the only case > > in which we use the __ op to unlock. > > __bit_spin_lock() by definition is *not* required to be atomic, > bit_spin_lock() is > - so I don't think we need a spinlock there. Agreed. The double underscore prefixed instructions are not required to be atomic in any way shape or form. > There is clearly a problem in slub code that it is pairing a > test_and_set_bit() > with a __clear_bit(). Latter can obviously clobber former if they are not a > single > instruction each unlike x86 or they use llock/scond kind of instructions > where the > interim store from other core is detected and causes a retry of whole > llock/scond > sequence. Yes, test_and_set_bit() + __clear_bit() is broken. > > If you take the lock in __bit_spin_unlock > > then the race cannot happen. > > Of course it won't but that means we penalize all non atomic callers of the > API > with a superfluous spinlock which is not require din first place given the > definition of API. Quite. _However_, your arch is still broken, but not by your fault. Its the generic-asm code that is wrong. The thing is that __bit_spin_unlock() uses __clear_bit_unlock(), which defaults to __clear_bit(). Which is wrong. --- Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() __clear_bit_unlock() is a special little snowflake. While it carries the non-atomic '__' prefix, it is specifically documented to pair with test_and_set_bit() and therefore should be 'somewhat' atomic. Therefore the generic implementation of __clear_bit_unlock() cannot use the fully non-atomic __clear_bit() as a default. If an arch is able to do better; is must provide an implementation of __clear_bit_unlock() itself. Reported-by: Vineet Gupta Signed-off-by: Peter Zijlstra (Intel) --- include/asm-generic/bitops/lock.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h index c30266e94806..8ef0ccbf8167 100644 --- a/include/asm-generic/bitops/lock.h +++ b/include/asm-generic/bitops/lock.h @@ -29,16 +29,16 @@ do {\ * @nr: the bit to set * @addr: the address to start counting from * - * This operation is like clear_bit_unlock, however it is not atomic. - * It does provide release barrier semantics so it can be used to unlock - * a bit lock, however it would only be used if no other CPU can modify - * any bits in the memory until the lock is released (a good example is - * if the bit lock itself protects access to the other bits in the word). + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all + * the bits in the word are protected by this lock some archs can use weaker + * ops to safely unlock. + * + * See for example x86's implementation. */ #define __clear_bit_unlock(nr, addr) \ do { \ - smp_mb(); \ - __clear_bit(nr, addr); \ + smp_mb__before_atomic();\ + clear_bit(nr, addr);\ } while (0) #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */ ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wed, Mar 09, 2016 at 11:13:49AM +0100, Peter Zijlstra wrote: > --- > Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() > > __clear_bit_unlock() is a special little snowflake. While it carries the > non-atomic '__' prefix, it is specifically documented to pair with > test_and_set_bit() and therefore should be 'somewhat' atomic. > > Therefore the generic implementation of __clear_bit_unlock() cannot use > the fully non-atomic __clear_bit() as a default. > > If an arch is able to do better; is must provide an implementation of > __clear_bit_unlock() itself. > FWIW, we could probably undo this if we unified all the spinlock based atomic ops implementations (there's a whole bunch doing essentially the same), and special cased __clear_bit_unlock() for that. Collapsing them is probably a good idea anyway, just a fair bit of non-trivial work to figure out all the differences and if they matter etc.. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote: >>> If you take the lock in __bit_spin_unlock >>> then the race cannot happen. >> >> Of course it won't but that means we penalize all non atomic callers of the >> API >> with a superfluous spinlock which is not require din first place given the >> definition of API. > > Quite. _However_, your arch is still broken, but not by your fault. Its > the generic-asm code that is wrong. > > The thing is that __bit_spin_unlock() uses __clear_bit_unlock(), which > defaults to __clear_bit(). Which is wrong. > > --- > Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() > > __clear_bit_unlock() is a special little snowflake. While it carries the > non-atomic '__' prefix, it is specifically documented to pair with > test_and_set_bit() and therefore should be 'somewhat' atomic. > > Therefore the generic implementation of __clear_bit_unlock() cannot use > the fully non-atomic __clear_bit() as a default. > > If an arch is able to do better; is must provide an implementation of > __clear_bit_unlock() itself. > > Reported-by: Vineet Gupta This needs to be CCed stable as it fixes a real bug for ARC. > Signed-off-by: Peter Zijlstra (Intel) Tested-by: Vineet Gupta FWIW, could we add some background to commit log, specifically what prompted this. Something like below... >8-- This came up as a result of hackbench livelock'ing in slab_lock() on ARC with SMP + SLUB + !LLSC. The issue was incorrect pairing of atomic ops. slab_lock() -> bit_spin_lock() -> test_and_set_bit() slab_unlock() -> __bit_spin_unlock() -> __clear_bit() The non serializing __clear_bit() was getting "lost" 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set 80543b90: or r3,r2,1<--- (B) other core unlocks right here 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock) Fixes ARC STAR 9000817404 >8-- > --- > include/asm-generic/bitops/lock.h | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/include/asm-generic/bitops/lock.h > b/include/asm-generic/bitops/lock.h > index c30266e94806..8ef0ccbf8167 100644 > --- a/include/asm-generic/bitops/lock.h > +++ b/include/asm-generic/bitops/lock.h > @@ -29,16 +29,16 @@ do { \ > * @nr: the bit to set > * @addr: the address to start counting from > * > - * This operation is like clear_bit_unlock, however it is not atomic. > - * It does provide release barrier semantics so it can be used to unlock > - * a bit lock, however it would only be used if no other CPU can modify > - * any bits in the memory until the lock is released (a good example is > - * if the bit lock itself protects access to the other bits in the word). > + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all > + * the bits in the word are protected by this lock some archs can use weaker > + * ops to safely unlock. > + * > + * See for example x86's implementation. > */ To be able to override/use-generic don't we need #ifndef > #define __clear_bit_unlock(nr, addr) \ > do { \ > - smp_mb(); \ > - __clear_bit(nr, addr); \ > + smp_mb__before_atomic();\ > + clear_bit(nr, addr);\ > } while (0) > > #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */ > ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wednesday 09 March 2016 04:01 PM, Peter Zijlstra wrote: > On Wed, Mar 09, 2016 at 11:13:49AM +0100, Peter Zijlstra wrote: >> --- >> Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() >> >> __clear_bit_unlock() is a special little snowflake. While it carries the >> non-atomic '__' prefix, it is specifically documented to pair with >> test_and_set_bit() and therefore should be 'somewhat' atomic. >> >> Therefore the generic implementation of __clear_bit_unlock() cannot use >> the fully non-atomic __clear_bit() as a default. >> >> If an arch is able to do better; is must provide an implementation of >> __clear_bit_unlock() itself. >> > > FWIW, we could probably undo this if we unified all the spinlock based > atomic ops implementations (there's a whole bunch doing essentially the > same), and special cased __clear_bit_unlock() for that. > > Collapsing them is probably a good idea anyway, just a fair bit of > non-trivial work to figure out all the differences and if they matter > etc.. Indeed I thought about this when we first did the SMP port. The only issue was somewhat more generated code with the hashed spinlocks (vs. my dumb 2 spin locks - which as I see now will also cause false sharing - likely ending up in the same cache line), but I was more of a micro-optimization freak then than I'm now :-) So yeah I agree ! ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wed, Mar 09, 2016 at 04:30:31PM +0530, Vineet Gupta wrote: > FWIW, could we add some background to commit log, specifically what prompted > this. > Something like below... Sure.. find below. > > +++ b/include/asm-generic/bitops/lock.h > > @@ -29,16 +29,16 @@ do {\ > > * @nr: the bit to set > > * @addr: the address to start counting from > > * > > + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If > > all > > + * the bits in the word are protected by this lock some archs can use > > weaker > > + * ops to safely unlock. > > + * > > + * See for example x86's implementation. > > */ > > To be able to override/use-generic don't we need #ifndef I did not follow through the maze, I think the few archs implementing this simply do not include this file at all. I'll let the first person that cares about this worry about that :-) --- Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() __clear_bit_unlock() is a special little snowflake. While it carries the non-atomic '__' prefix, it is specifically documented to pair with test_and_set_bit() and therefore should be 'somewhat' atomic. Therefore the generic implementation of __clear_bit_unlock() cannot use the fully non-atomic __clear_bit() as a default. If an arch is able to do better; is must provide an implementation of __clear_bit_unlock() itself. Specifically, this came up as a result of hackbench livelock'ing in slab_lock() on ARC with SMP + SLUB + !LLSC. The issue was incorrect pairing of atomic ops. slab_lock() -> bit_spin_lock() -> test_and_set_bit() slab_unlock() -> __bit_spin_unlock() -> __clear_bit() The non serializing __clear_bit() was getting "lost" 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set 80543b90: or r3,r2,1<--- (B) other core unlocks right here 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock) Fixes ARC STAR 9000817404 (and probably more). Cc: sta...@vger.kernel.org Reported-by: Vineet Gupta Tested-by: Vineet Gupta Signed-off-by: Peter Zijlstra (Intel) --- include/asm-generic/bitops/lock.h | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h index c30266e94806..8ef0ccbf8167 100644 --- a/include/asm-generic/bitops/lock.h +++ b/include/asm-generic/bitops/lock.h @@ -29,16 +29,16 @@ do {\ * @nr: the bit to set * @addr: the address to start counting from * - * This operation is like clear_bit_unlock, however it is not atomic. - * It does provide release barrier semantics so it can be used to unlock - * a bit lock, however it would only be used if no other CPU can modify - * any bits in the memory until the lock is released (a good example is - * if the bit lock itself protects access to the other bits in the word). + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all + * the bits in the word are protected by this lock some archs can use weaker + * ops to safely unlock. + * + * See for example x86's implementation. */ #define __clear_bit_unlock(nr, addr) \ do { \ - smp_mb(); \ - __clear_bit(nr, addr); \ + smp_mb__before_atomic();\ + clear_bit(nr, addr);\ } while (0) #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */ ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wednesday 09 March 2016 05:10 PM, Peter Zijlstra wrote: > On Wed, Mar 09, 2016 at 04:30:31PM +0530, Vineet Gupta wrote: >> FWIW, could we add some background to commit log, specifically what prompted >> this. >> Something like below... > > Sure.. find below. > >>> +++ b/include/asm-generic/bitops/lock.h >>> @@ -29,16 +29,16 @@ do {\ >>> * @nr: the bit to set >>> * @addr: the address to start counting from >>> * >>> + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If >>> all >>> + * the bits in the word are protected by this lock some archs can use >>> weaker >>> + * ops to safely unlock. >>> + * >>> + * See for example x86's implementation. >>> */ >> >> To be able to override/use-generic don't we need #ifndef > > I did not follow through the maze, I think the few archs implementing > this simply do not include this file at all. > > I'll let the first person that cares about this worry about that :-) Ok - that's be me :-) although I really don't see much gains in case of ARC LLSC. For us, LD + BCLR + ST is very similar to LLOCK + BCLR + SCOND atleast in terms of cache coherency transactions ! > > --- > Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock() > > __clear_bit_unlock() is a special little snowflake. While it carries the > non-atomic '__' prefix, it is specifically documented to pair with > test_and_set_bit() and therefore should be 'somewhat' atomic. > > Therefore the generic implementation of __clear_bit_unlock() cannot use > the fully non-atomic __clear_bit() as a default. > > If an arch is able to do better; is must provide an implementation of > __clear_bit_unlock() itself. > > Specifically, this came up as a result of hackbench livelock'ing in > slab_lock() on ARC with SMP + SLUB + !LLSC. > > The issue was incorrect pairing of atomic ops. > > slab_lock() -> bit_spin_lock() -> test_and_set_bit() > slab_unlock() -> __bit_spin_unlock() -> __clear_bit() > > The non serializing __clear_bit() was getting "lost" > > 80543b8e: ld_s r2,[r13,0] <--- (A) Finds PG_locked is set > 80543b90: or r3,r2,1<--- (B) other core unlocks right here > 80543b94: st_s r3,[r13,0] <--- (C) sets PG_locked (overwrites > unlock) > > Fixes ARC STAR 9000817404 (and probably more). > > Cc: sta...@vger.kernel.org > Reported-by: Vineet Gupta > Tested-by: Vineet Gupta > Signed-off-by: Peter Zijlstra (Intel) LGTM. Thx a bunch Peter ! -Vineet > --- > include/asm-generic/bitops/lock.h | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/include/asm-generic/bitops/lock.h > b/include/asm-generic/bitops/lock.h > index c30266e94806..8ef0ccbf8167 100644 > --- a/include/asm-generic/bitops/lock.h > +++ b/include/asm-generic/bitops/lock.h > @@ -29,16 +29,16 @@ do { \ > * @nr: the bit to set > * @addr: the address to start counting from > * > - * This operation is like clear_bit_unlock, however it is not atomic. > - * It does provide release barrier semantics so it can be used to unlock > - * a bit lock, however it would only be used if no other CPU can modify > - * any bits in the memory until the lock is released (a good example is > - * if the bit lock itself protects access to the other bits in the word). > + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all > + * the bits in the word are protected by this lock some archs can use weaker > + * ops to safely unlock. > + * > + * See for example x86's implementation. > */ > #define __clear_bit_unlock(nr, addr) \ > do { \ > - smp_mb(); \ > - __clear_bit(nr, addr); \ > + smp_mb__before_atomic();\ > + clear_bit(nr, addr);\ > } while (0) > > #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */ > ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wed, Mar 09, 2016 at 05:23:26PM +0530, Vineet Gupta wrote: > > I did not follow through the maze, I think the few archs implementing > > this simply do not include this file at all. > > > > I'll let the first person that cares about this worry about that :-) > > Ok - that's be me :-) although I really don't see much gains in case of ARC > LLSC. > > For us, LD + BCLR + ST is very similar to LLOCK + BCLR + SCOND atleast in > terms of > cache coherency transactions ! The win would be in not having to ever retry the SCOND. Although in this case, the contending CPU would be doing reads -- which I assume will not cause a SCOND to fail, so it might indeed not make any difference. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote: >> There is clearly a problem in slub code that it is pairing a >> test_and_set_bit() >> with a __clear_bit(). Latter can obviously clobber former if they are not a >> single >> instruction each unlike x86 or they use llock/scond kind of instructions >> where the >> interim store from other core is detected and causes a retry of whole >> llock/scond >> sequence. > > Yes, test_and_set_bit() + __clear_bit() is broken. But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so (ignoring the performance thing for discussion sake, which is a side effect of this implementation). So despite the comment below in bit_spinlock.h I don't quite comprehend how this is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed cases, then isn't this true in general for lot more cases in kernel, i.e. pairing atomic lock with non-atomic unlock ? I'm missing something ! | /* | * bit-based spin_unlock() | * non-atomic version, which can be used eg. if the bit lock itself is | * protecting the rest of the flags in the word. | */ | static inline void __bit_spin_unlock(int bitnum, unsigned long *addr) ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wed, Mar 09, 2016 at 06:52:45PM +0530, Vineet Gupta wrote: > On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote: > >> There is clearly a problem in slub code that it is pairing a > >> test_and_set_bit() > >> with a __clear_bit(). Latter can obviously clobber former if they are not > >> a single > >> instruction each unlike x86 or they use llock/scond kind of instructions > >> where the > >> interim store from other core is detected and causes a retry of whole > >> llock/scond > >> sequence. > > > > Yes, test_and_set_bit() + __clear_bit() is broken. > > But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so > (ignoring the performance thing for discussion sake, which is a side effect of > this implementation). The sort answer is: Per definition. They are defined to work together, which is what makes __clear_bit_unlock() such a special function. > So despite the comment below in bit_spinlock.h I don't quite comprehend how > this > is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed > cases, then isn't this true in general for lot more cases in kernel, i.e. > pairing > atomic lock with non-atomic unlock ? I'm missing something ! x86 (and others) do in fact use non-atomic instructions for spin_unlock(). But as this is all arch specific, we can make these assumptions. Its just that generic code cannot rely on it. So let me try and explain. The problem as identified is: CPU0CPU1 bit_spin_lock() __bit_spin_unlock() 1: /* fetch_or, r1 holds the old value */ spin_lock loadr1, addr loadr1, addr bclrr2, r1, 1 store r2, addr or r2, r1, 1 store r2, addr/* lost the store from CPU1 */ spin_unlock and r1, 1 bnz 2 /* it was set, go wait */ ret 2: loadr1, addr and r1, 1 bnz 2 /* wait until its not set */ b 1 /* try again */ For LL/SC we replace: spin_lock loadr1, addr ... store r2, addr spin_unlock With the (obvious): 1: load-locked r1, addr ... store-cond r2, addr bnz 1 /* or whatever branch instruction is required to retry */ In this case the failure cannot happen, because the store from CPU1 would have invalidated the lock from CPU0 and caused the store-cond to fail and retry the loop, observing the new value. ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
[PATCH] arc: use little endian accesses
Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu and le32_to_cpu because it is not really guaranteed that drivers handles any ordering themselves. For example, serial port driver doesn't work when kernel is build for arc big endian architecture. Signed-off-by: Lada Trimasova Cc: Alexey Brodkin Cc: Vineet Gupta --- arch/arc/include/asm/io.h | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h index 694ece8..0b3d5ea 100644 --- a/arch/arc/include/asm/io.h +++ b/arch/arc/include/asm/io.h @@ -129,15 +129,17 @@ static inline void __raw_writel(u32 w, volatile void __iomem *addr) #define writel(v,c)({ __iowmb(); writel_relaxed(v,c); }) /* - * Relaxed API for drivers which can handle any ordering themselves + * This are defined to perform little endian accesses */ #define readb_relaxed(c) __raw_readb(c) -#define readw_relaxed(c) __raw_readw(c) -#define readl_relaxed(c) __raw_readl(c) +#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \ + __raw_readw(c)); __r; }) +#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \ + __raw_readl(c)); __r; }) #define writeb_relaxed(v,c)__raw_writeb(v,c) -#define writew_relaxed(v,c)__raw_writew(v,c) -#define writel_relaxed(v,c)__raw_writel(v,c) +#define writew_relaxed(v,c)__raw_writew((__force u16) cpu_to_le16(v),c) +#define writel_relaxed(v,c)__raw_writel((__force u32) cpu_to_le32(v),c) #include -- 2.5.0 ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] arc: use little endian accesses
+CC Noam On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote: > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu > and le32_to_cpu because it is not really guaranteed that drivers handles > any ordering themselves. That is the driver issue. readxx as API simply returns data in native endianness. We've had EZChip running big endian and so far and they didn't need this change. > For example, serial port driver doesn't work when kernel is build for > arc big endian architecture. Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working fine. I presume this is the systemC model for device and standard 8250 driver and very likely the model is not fixed endian or something. Alexey knows about this stuff - this was discussed on lkml back in 2013 when he was fighting the Xilinx systemAce driver endian issues > Signed-off-by: Lada Trimasova Sorry NACK on this ! If you still think we need it I need more data / details on what exactly is failing in 8250 and how ! -Vineet > Cc: Alexey Brodkin > Cc: Vineet Gupta > --- > arch/arc/include/asm/io.h | 12 +++- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h > index 694ece8..0b3d5ea 100644 > --- a/arch/arc/include/asm/io.h > +++ b/arch/arc/include/asm/io.h > @@ -129,15 +129,17 @@ static inline void __raw_writel(u32 w, volatile void > __iomem *addr) > #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) > > /* > - * Relaxed API for drivers which can handle any ordering themselves > + * This are defined to perform little endian accesses > */ > #define readb_relaxed(c) __raw_readb(c) > -#define readw_relaxed(c) __raw_readw(c) > -#define readl_relaxed(c) __raw_readl(c) > +#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \ > + __raw_readw(c)); __r; }) > +#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \ > + __raw_readl(c)); __r; }) > > #define writeb_relaxed(v,c) __raw_writeb(v,c) > -#define writew_relaxed(v,c) __raw_writew(v,c) > -#define writel_relaxed(v,c) __raw_writel(v,c) > +#define writew_relaxed(v,c) __raw_writew((__force u16) cpu_to_le16(v),c) > +#define writel_relaxed(v,c) __raw_writel((__force u32) cpu_to_le32(v),c) > > #include > ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] arc: use little endian accesses
On Thursday 10 March 2016 10:35 AM, Vineet Gupta wrote: > +CC Noam > > On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote: >> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu >> > and le32_to_cpu because it is not really guaranteed that drivers handles >> > any ordering themselves. > That is the driver issue. readxx as API simply returns data in native > endianness. > We've had EZChip running big endian and so far and they didn't need this > change. > >> > For example, serial port driver doesn't work when kernel is build for >> > arc big endian architecture. > Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working > fine. > I presume this is the systemC model for device and standard 8250 driver and > very > likely the model is not fixed endian or something. > > Alexey knows about this stuff - this was discussed on lkml back in 2013 when > he > was fighting the Xilinx systemAce driver endian issues > Do you need "native-endian" DT entry in nsimosci DT bindings for uart ? ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic
On Wednesday 09 March 2016 08:21 PM, Peter Zijlstra wrote: >> But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so >> (ignoring the performance thing for discussion sake, which is a side effect >> of >> this implementation). > > The sort answer is: Per definition. They are defined to work together, > which is what makes __clear_bit_unlock() such a special function. > >> So despite the comment below in bit_spinlock.h I don't quite comprehend how >> this >> is allowable. And if say, by deduction, this is fine for LLSC or lock >> prefixed >> cases, then isn't this true in general for lot more cases in kernel, i.e. >> pairing >> atomic lock with non-atomic unlock ? I'm missing something ! > > x86 (and others) do in fact use non-atomic instructions for > spin_unlock(). But as this is all arch specific, we can make these > assumptions. Its just that generic code cannot rely on it. OK despite being obvious now, I was not seeing the similarity between spin_*lock() and bit_spin*lock() :-( ARC also uses standard ST for spin_unlock() so by analogy __bit_spin_unlock() (for LLSC case) would be correctly paired with bit_spin_lock(). But then why would anyone need bit_spin_unlock() at all. Specially after this patch from you which tightens __bit_spin_lock() even more for the general case. Thing is if the API exists majority of people would would use the more conservative version w/o understand all these nuances. Can we pursue the path of moving bit_spin_unlock() over to __bit_spin_lock(): first changing the backend only and if proven stable replacing the call-sites themselves. > > So let me try and explain. > > > The problem as identified is: > > CPU0 CPU1 > > bit_spin_lock() __bit_spin_unlock() > 1: > /* fetch_or, r1 holds the old value */ > spin_lock > loadr1, addr > loadr1, addr > bclrr2, r1, 1 > store r2, addr > or r2, r1, 1 > store r2, addr/* lost the store from CPU1 */ > spin_unlock > > and r1, 1 > bnz 2 /* it was set, go wait */ > ret > > 2: > loadr1, addr > and r1, 1 > bnz 2 /* wait until its not set */ > > b 1 /* try again */ > > > > For LL/SC we replace: > > spin_lock > loadr1, addr > > ... > > store r2, addr > spin_unlock > > With the (obvious): > > 1: > load-locked r1, addr > > ... > > store-cond r2, addr > bnz 1 /* or whatever branch instruction is required to > retry */ > > > In this case the failure cannot happen, because the store from CPU1 > would have invalidated the lock from CPU0 and caused the > store-cond to fail and retry the loop, observing the new value. You did it again, A picture is worth thousand words ! Thx, -Vineet ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] arc: use little endian accesses
Hi Vineet, On Thu, 2016-03-10 at 05:05 +, Vineet Gupta wrote: > +CC Noam > > On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote: > > > > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu > > and le32_to_cpu because it is not really guaranteed that drivers handles > > any ordering themselves. > That is the driver issue. readxx as API simply returns data in native > endianness. > We've had EZChip running big endian and so far and they didn't need this > change. Let me disagree with you here. See what is said in "include/asm-generic/io.h": -->8- /* * __raw_{read,write}{b,w,l,q}() access memory in native endianness. * * On some architectures memory mapped IO needs to be accessed differently. * On the simple architectures, we just read/write the memory location * directly. */ ... /* * {read,write}{b,w,l,q}() access little endian memory and return result in * native endianness. */ -->8- And that's an implementation we have for ARC: -->8- #define readb(c)({ u8 __v = readb_relaxed(c); __iormb(); __v; }) #define readw(c)({ u16 __v = readw_relaxed(c); __iormb(); __v; }) #define readl(c)({ u32 __v = readl_relaxed(c); __iormb(); __v; }) #define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); }) #define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); }) #define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); }) /* * Relaxed API for drivers which can handle any ordering themselves */ #define readb_relaxed(c)__raw_readb(c) #define readw_relaxed(c)__raw_readw(c) #define readl_relaxed(c)__raw_readl(c) #define writeb_relaxed(v,c) __raw_writeb(v,c) #define writew_relaxed(v,c) __raw_writew(v,c) #define writel_relaxed(v,c) __raw_writel(v,c) -->8- Which is effectively (related to endianess discussion): -->8- #define readX(c)__raw_readX(c) #define writeX(v,c) __raw_writeX(v,c) -->8- That looks IMHO incorrect if we read API description in "include/asm-generic/io.h". BTW description of {read,write}{b,w,l,q}() is a bit misleading in part saying "... and return result in __native_endianness__". But real implementation of {read,write}{b,w,l,q}() in "include/asm-generic/io.h" really shows what was meant - note __leXX_to_cpu() and cpu_to_leXX are used. > > > > For example, serial port driver doesn't work when kernel is build for > > arc big endian architecture. > Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working > fine. > I presume this is the systemC model for device and standard 8250 driver and > very > likely the model is not fixed endian or something. Model is indeed little-endian. We build it only once and than changing only "nsim_isa_big_endian" property (which changes only CPU core endianess) may use it equally well with little- and big-endian builds of Linux kernel. -Alexey ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc
Re: [PATCH] arc: use little endian accesses
On Thursday 10 March 2016, Vineet Gupta wrote: > On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote: > > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu > > and le32_to_cpu because it is not really guaranteed that drivers handles > > any ordering themselves. > > That is the driver issue. readxx as API simply returns data in native > endianness. > We've had EZChip running big endian and so far and they didn't need this > change. Most drivers using readl() or readl_relaxed() expect those to perform byte swaps on big-endian architectures, as the registers tend to be fixed endian, so the change seems reasonable. It depends a little bit on how endian mode is implemented in the CPU: do you follow the normal model of swapping byte order in the load/store instructions of the CPU when running big-endian, or is the CPU always running little-endian but the bus addresses get mangled on byte accesses to give the illusion of a big-endian system? Arnd ___ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc