Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Peter Zijlstra
On Wed, Mar 09, 2016 at 12:13:16PM +0530, Vineet Gupta wrote:
> +CC linux-arch, parisc folks, PeterZ
> 
> On Wednesday 09 March 2016 02:10 AM, Christoph Lameter wrote:
> > On Tue, 8 Mar 2016, Vineet Gupta wrote:
> > 
> >> # set the bit
> >> 80543b8e:  ld_s   r2,[r13,0] <--- (A) Finds PG_locked is set
> >> 80543b90:  or r3,r2,1<--- (B) other core unlocks right here
> >> 80543b94:  st_s   r3,[r13,0] <--- (C) sets PG_locked (overwrites 
> >> unlock)
> > 
> > Duh. Guess you  need to take the spinlock also in the arch specific
> > implementation of __bit_spin_unlock(). This is certainly not the only case
> > in which we use the __ op to unlock.
> 
> __bit_spin_lock() by definition is *not* required to be atomic, 
> bit_spin_lock() is
> - so I don't think we need a spinlock there.

Agreed. The double underscore prefixed instructions are not required to
be atomic in any way shape or form.

> There is clearly a problem in slub code that it is pairing a 
> test_and_set_bit()
> with a __clear_bit(). Latter can obviously clobber former if they are not a 
> single
> instruction each unlike x86 or they use llock/scond kind of instructions 
> where the
> interim store from other core is detected and causes a retry of whole 
> llock/scond
> sequence.

Yes, test_and_set_bit() + __clear_bit() is broken.

> > If you take the lock in __bit_spin_unlock
> > then the race cannot happen.
> 
> Of course it won't but that means we penalize all non atomic callers of the 
> API
> with a superfluous spinlock which is not require din first place given the
> definition of API.

Quite. _However_, your arch is still broken, but not by your fault. Its
the generic-asm code that is wrong.

The thing is that __bit_spin_unlock() uses __clear_bit_unlock(), which
defaults to __clear_bit(). Which is wrong.

---
Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()

__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.

Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.

If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.

Reported-by: Vineet Gupta 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/asm-generic/bitops/lock.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/bitops/lock.h 
b/include/asm-generic/bitops/lock.h
index c30266e94806..8ef0ccbf8167 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -29,16 +29,16 @@ do {\
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
- * This operation is like clear_bit_unlock, however it is not atomic.
- * It does provide release barrier semantics so it can be used to unlock
- * a bit lock, however it would only be used if no other CPU can modify
- * any bits in the memory until the lock is released (a good example is
- * if the bit lock itself protects access to the other bits in the word).
+ * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all
+ * the bits in the word are protected by this lock some archs can use weaker
+ * ops to safely unlock.
+ *
+ * See for example x86's implementation.
  */
 #define __clear_bit_unlock(nr, addr)   \
 do {   \
-   smp_mb();   \
-   __clear_bit(nr, addr);  \
+   smp_mb__before_atomic();\
+   clear_bit(nr, addr);\
 } while (0)
 
 #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Peter Zijlstra
On Wed, Mar 09, 2016 at 11:13:49AM +0100, Peter Zijlstra wrote:
> ---
> Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()
> 
> __clear_bit_unlock() is a special little snowflake. While it carries the
> non-atomic '__' prefix, it is specifically documented to pair with
> test_and_set_bit() and therefore should be 'somewhat' atomic.
> 
> Therefore the generic implementation of __clear_bit_unlock() cannot use
> the fully non-atomic __clear_bit() as a default.
> 
> If an arch is able to do better; is must provide an implementation of
> __clear_bit_unlock() itself.
> 

FWIW, we could probably undo this if we unified all the spinlock based
atomic ops implementations (there's a whole bunch doing essentially the
same), and special cased __clear_bit_unlock() for that.

Collapsing them is probably a good idea anyway, just a fair bit of
non-trivial work to figure out all the differences and if they matter
etc..

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Vineet Gupta
On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote:
>>> If you take the lock in __bit_spin_unlock
>>> then the race cannot happen.
>>
>> Of course it won't but that means we penalize all non atomic callers of the 
>> API
>> with a superfluous spinlock which is not require din first place given the
>> definition of API.
> 
> Quite. _However_, your arch is still broken, but not by your fault. Its
> the generic-asm code that is wrong.
> 
> The thing is that __bit_spin_unlock() uses __clear_bit_unlock(), which
> defaults to __clear_bit(). Which is wrong.
> 
> ---
> Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()
> 
> __clear_bit_unlock() is a special little snowflake. While it carries the
> non-atomic '__' prefix, it is specifically documented to pair with
> test_and_set_bit() and therefore should be 'somewhat' atomic.
> 
> Therefore the generic implementation of __clear_bit_unlock() cannot use
> the fully non-atomic __clear_bit() as a default.
> 
> If an arch is able to do better; is must provide an implementation of
> __clear_bit_unlock() itself.
> 
> Reported-by: Vineet Gupta 

This needs to be CCed stable as it fixes a real bug for ARC.

> Signed-off-by: Peter Zijlstra (Intel) 

Tested-by: Vineet Gupta 

FWIW, could we add some background to commit log, specifically what prompted 
this.
Something like below...

>8--
This came up as a result of hackbench livelock'ing in slab_lock() on ARC with 
SMP
+ SLUB + !LLSC.

The issue was incorrect pairing of atomic ops.

slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()

The non serializing __clear_bit() was getting "lost"

80543b8e:   ld_s   r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90:   or r3,r2,1<--- (B) other core unlocks right here
80543b94:   st_s   r3,[r13,0] <--- (C) sets PG_locked (overwrites 
unlock)

Fixes ARC STAR 9000817404
>8--

> ---
>  include/asm-generic/bitops/lock.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/lock.h 
> b/include/asm-generic/bitops/lock.h
> index c30266e94806..8ef0ccbf8167 100644
> --- a/include/asm-generic/bitops/lock.h
> +++ b/include/asm-generic/bitops/lock.h
> @@ -29,16 +29,16 @@ do {  \
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> - * This operation is like clear_bit_unlock, however it is not atomic.
> - * It does provide release barrier semantics so it can be used to unlock
> - * a bit lock, however it would only be used if no other CPU can modify
> - * any bits in the memory until the lock is released (a good example is
> - * if the bit lock itself protects access to the other bits in the word).
> + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all
> + * the bits in the word are protected by this lock some archs can use weaker
> + * ops to safely unlock.
> + *
> + * See for example x86's implementation.
>   */

To be able to override/use-generic don't we need #ifndef 

>  #define __clear_bit_unlock(nr, addr) \
>  do { \
> - smp_mb();   \
> - __clear_bit(nr, addr);  \
> + smp_mb__before_atomic();\
> + clear_bit(nr, addr);\
>  } while (0)
>  
>  #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
> 


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Vineet Gupta
On Wednesday 09 March 2016 04:01 PM, Peter Zijlstra wrote:
> On Wed, Mar 09, 2016 at 11:13:49AM +0100, Peter Zijlstra wrote:
>> ---
>> Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()
>>
>> __clear_bit_unlock() is a special little snowflake. While it carries the
>> non-atomic '__' prefix, it is specifically documented to pair with
>> test_and_set_bit() and therefore should be 'somewhat' atomic.
>>
>> Therefore the generic implementation of __clear_bit_unlock() cannot use
>> the fully non-atomic __clear_bit() as a default.
>>
>> If an arch is able to do better; is must provide an implementation of
>> __clear_bit_unlock() itself.
>>
> 
> FWIW, we could probably undo this if we unified all the spinlock based
> atomic ops implementations (there's a whole bunch doing essentially the
> same), and special cased __clear_bit_unlock() for that.
> 
> Collapsing them is probably a good idea anyway, just a fair bit of
> non-trivial work to figure out all the differences and if they matter
> etc..

Indeed I thought about this when we first did the SMP port. The only issue was
somewhat more generated code with the hashed spinlocks (vs. my dumb 2 spin 
locks -
which as I see now will also cause false sharing - likely ending up in the same
cache line), but I was more of a micro-optimization freak then than I'm now :-)

So yeah I agree !





___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Peter Zijlstra
On Wed, Mar 09, 2016 at 04:30:31PM +0530, Vineet Gupta wrote:
> FWIW, could we add some background to commit log, specifically what prompted 
> this.
> Something like below...

Sure.. find below.

> > +++ b/include/asm-generic/bitops/lock.h
> > @@ -29,16 +29,16 @@ do {\
> >   * @nr: the bit to set
> >   * @addr: the address to start counting from
> >   *
> > + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If 
> > all
> > + * the bits in the word are protected by this lock some archs can use 
> > weaker
> > + * ops to safely unlock.
> > + *
> > + * See for example x86's implementation.
> >   */
> 
> To be able to override/use-generic don't we need #ifndef 

I did not follow through the maze, I think the few archs implementing
this simply do not include this file at all.

I'll let the first person that cares about this worry about that :-)

---
Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()

__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.

Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.

If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.

Specifically, this came up as a result of hackbench livelock'ing in
slab_lock() on ARC with SMP + SLUB + !LLSC.

The issue was incorrect pairing of atomic ops.

slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()

The non serializing __clear_bit() was getting "lost"

80543b8e:   ld_s   r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90:   or r3,r2,1<--- (B) other core unlocks right here
80543b94:   st_s   r3,[r13,0] <--- (C) sets PG_locked (overwrites 
unlock)

Fixes ARC STAR 9000817404 (and probably more).

Cc: sta...@vger.kernel.org
Reported-by: Vineet Gupta 
Tested-by: Vineet Gupta 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/asm-generic/bitops/lock.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/bitops/lock.h 
b/include/asm-generic/bitops/lock.h
index c30266e94806..8ef0ccbf8167 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -29,16 +29,16 @@ do {\
  * @nr: the bit to set
  * @addr: the address to start counting from
  *
- * This operation is like clear_bit_unlock, however it is not atomic.
- * It does provide release barrier semantics so it can be used to unlock
- * a bit lock, however it would only be used if no other CPU can modify
- * any bits in the memory until the lock is released (a good example is
- * if the bit lock itself protects access to the other bits in the word).
+ * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all
+ * the bits in the word are protected by this lock some archs can use weaker
+ * ops to safely unlock.
+ *
+ * See for example x86's implementation.
  */
 #define __clear_bit_unlock(nr, addr)   \
 do {   \
-   smp_mb();   \
-   __clear_bit(nr, addr);  \
+   smp_mb__before_atomic();\
+   clear_bit(nr, addr);\
 } while (0)
 
 #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Vineet Gupta
On Wednesday 09 March 2016 05:10 PM, Peter Zijlstra wrote:
> On Wed, Mar 09, 2016 at 04:30:31PM +0530, Vineet Gupta wrote:
>> FWIW, could we add some background to commit log, specifically what prompted 
>> this.
>> Something like below...
> 
> Sure.. find below.
> 
>>> +++ b/include/asm-generic/bitops/lock.h
>>> @@ -29,16 +29,16 @@ do {\
>>>   * @nr: the bit to set
>>>   * @addr: the address to start counting from
>>>   *
>>> + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If 
>>> all
>>> + * the bits in the word are protected by this lock some archs can use 
>>> weaker
>>> + * ops to safely unlock.
>>> + *
>>> + * See for example x86's implementation.
>>>   */
>>
>> To be able to override/use-generic don't we need #ifndef 
> 
> I did not follow through the maze, I think the few archs implementing
> this simply do not include this file at all.
> 
> I'll let the first person that cares about this worry about that :-)

Ok - that's be me :-) although I really don't see much gains in case of ARC 
LLSC.

For us, LD + BCLR + ST is very similar to LLOCK + BCLR + SCOND atleast in terms 
of
cache coherency transactions !

> 
> ---
> Subject: bitops: Do not default to __clear_bit() for __clear_bit_unlock()
> 
> __clear_bit_unlock() is a special little snowflake. While it carries the
> non-atomic '__' prefix, it is specifically documented to pair with
> test_and_set_bit() and therefore should be 'somewhat' atomic.
> 
> Therefore the generic implementation of __clear_bit_unlock() cannot use
> the fully non-atomic __clear_bit() as a default.
> 
> If an arch is able to do better; is must provide an implementation of
> __clear_bit_unlock() itself.
> 
> Specifically, this came up as a result of hackbench livelock'ing in
> slab_lock() on ARC with SMP + SLUB + !LLSC.
> 
> The issue was incorrect pairing of atomic ops.
> 
> slab_lock() -> bit_spin_lock() -> test_and_set_bit()
> slab_unlock() -> __bit_spin_unlock() -> __clear_bit()
> 
> The non serializing __clear_bit() was getting "lost"
> 
> 80543b8e: ld_s   r2,[r13,0] <--- (A) Finds PG_locked is set
> 80543b90: or r3,r2,1<--- (B) other core unlocks right here
> 80543b94: st_s   r3,[r13,0] <--- (C) sets PG_locked (overwrites 
> unlock)
> 
> Fixes ARC STAR 9000817404 (and probably more).
> 
> Cc: sta...@vger.kernel.org
> Reported-by: Vineet Gupta 
> Tested-by: Vineet Gupta 
> Signed-off-by: Peter Zijlstra (Intel) 

LGTM. Thx a bunch Peter !

-Vineet

> ---
>  include/asm-generic/bitops/lock.h | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/include/asm-generic/bitops/lock.h 
> b/include/asm-generic/bitops/lock.h
> index c30266e94806..8ef0ccbf8167 100644
> --- a/include/asm-generic/bitops/lock.h
> +++ b/include/asm-generic/bitops/lock.h
> @@ -29,16 +29,16 @@ do {  \
>   * @nr: the bit to set
>   * @addr: the address to start counting from
>   *
> - * This operation is like clear_bit_unlock, however it is not atomic.
> - * It does provide release barrier semantics so it can be used to unlock
> - * a bit lock, however it would only be used if no other CPU can modify
> - * any bits in the memory until the lock is released (a good example is
> - * if the bit lock itself protects access to the other bits in the word).
> + * A weaker form of clear_bit_unlock() as used by __bit_lock_unlock(). If all
> + * the bits in the word are protected by this lock some archs can use weaker
> + * ops to safely unlock.
> + *
> + * See for example x86's implementation.
>   */
>  #define __clear_bit_unlock(nr, addr) \
>  do { \
> - smp_mb();   \
> - __clear_bit(nr, addr);  \
> + smp_mb__before_atomic();\
> + clear_bit(nr, addr);\
>  } while (0)
>  
>  #endif /* _ASM_GENERIC_BITOPS_LOCK_H_ */
> 


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Peter Zijlstra
On Wed, Mar 09, 2016 at 05:23:26PM +0530, Vineet Gupta wrote:
> > I did not follow through the maze, I think the few archs implementing
> > this simply do not include this file at all.
> > 
> > I'll let the first person that cares about this worry about that :-)
> 
> Ok - that's be me :-) although I really don't see much gains in case of ARC 
> LLSC.
> 
> For us, LD + BCLR + ST is very similar to LLOCK + BCLR + SCOND atleast in 
> terms of
> cache coherency transactions !

The win would be in not having to ever retry the SCOND. Although in this
case, the contending CPU would be doing reads -- which I assume will not
cause a SCOND to fail, so it might indeed not make any difference.

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Vineet Gupta
On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote:
>> There is clearly a problem in slub code that it is pairing a 
>> test_and_set_bit()
>> with a __clear_bit(). Latter can obviously clobber former if they are not a 
>> single
>> instruction each unlike x86 or they use llock/scond kind of instructions 
>> where the
>> interim store from other core is detected and causes a retry of whole 
>> llock/scond
>> sequence.
> 
> Yes, test_and_set_bit() + __clear_bit() is broken.

But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so
(ignoring the performance thing for discussion sake, which is a side effect of
this implementation).

So despite the comment below in bit_spinlock.h I don't quite comprehend how this
is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed
cases, then isn't this true in general for lot more cases in kernel, i.e. 
pairing
atomic lock with non-atomic unlock ? I'm missing something !

| /*
|  *  bit-based spin_unlock()
|  *  non-atomic version, which can be used eg. if the bit lock itself is
|  *  protecting the rest of the flags in the word.
|  */
| static inline void __bit_spin_unlock(int bitnum, unsigned long *addr)



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Peter Zijlstra
On Wed, Mar 09, 2016 at 06:52:45PM +0530, Vineet Gupta wrote:
> On Wednesday 09 March 2016 03:43 PM, Peter Zijlstra wrote:
> >> There is clearly a problem in slub code that it is pairing a 
> >> test_and_set_bit()
> >> with a __clear_bit(). Latter can obviously clobber former if they are not 
> >> a single
> >> instruction each unlike x86 or they use llock/scond kind of instructions 
> >> where the
> >> interim store from other core is detected and causes a retry of whole 
> >> llock/scond
> >> sequence.
> > 
> > Yes, test_and_set_bit() + __clear_bit() is broken.
> 
> But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so
> (ignoring the performance thing for discussion sake, which is a side effect of
> this implementation).

The sort answer is: Per definition. They are defined to work together,
which is what makes __clear_bit_unlock() such a special function.

> So despite the comment below in bit_spinlock.h I don't quite comprehend how 
> this
> is allowable. And if say, by deduction, this is fine for LLSC or lock prefixed
> cases, then isn't this true in general for lot more cases in kernel, i.e. 
> pairing
> atomic lock with non-atomic unlock ? I'm missing something !

x86 (and others) do in fact use non-atomic instructions for
spin_unlock(). But as this is all arch specific, we can make these
assumptions. Its just that generic code cannot rely on it.

So let me try and explain.


The problem as identified is:

CPU0CPU1

bit_spin_lock() __bit_spin_unlock()
1:
/* fetch_or, r1 holds the old value */
spin_lock
loadr1, addr
loadr1, addr
bclrr2, r1, 1
store   r2, addr
or  r2, r1, 1
store   r2, addr/* lost the store from CPU1 */
spin_unlock

and r1, 1
bnz 2   /* it was set, go wait */
ret

2:
loadr1, addr
and r1, 1
bnz 2   /* wait until its not set */

b   1   /* try again */



For LL/SC we replace:

spin_lock
loadr1, addr

...

store   r2, addr
spin_unlock

With the (obvious):

1:
load-locked r1, addr

...

store-cond  r2, addr
bnz 1 /* or whatever branch instruction is required to 
retry */


In this case the failure cannot happen, because the store from CPU1
would have invalidated the lock from CPU0 and caused the
store-cond to fail and retry the loop, observing the new value.



___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


[PATCH] arc: use little endian accesses

2016-03-09 Thread Lada Trimasova
Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
and le32_to_cpu because it is not really guaranteed that drivers handles
any ordering themselves.
For example, serial port driver doesn't work when kernel is build for
arc big endian architecture.

Signed-off-by: Lada Trimasova 
Cc: Alexey Brodkin 
Cc: Vineet Gupta 
---
 arch/arc/include/asm/io.h | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
index 694ece8..0b3d5ea 100644
--- a/arch/arc/include/asm/io.h
+++ b/arch/arc/include/asm/io.h
@@ -129,15 +129,17 @@ static inline void __raw_writel(u32 w, volatile void 
__iomem *addr)
 #define writel(v,c)({ __iowmb(); writel_relaxed(v,c); })
 
 /*
- * Relaxed API for drivers which can handle any ordering themselves
+ * This are defined to perform little endian accesses
  */
 #define readb_relaxed(c)   __raw_readb(c)
-#define readw_relaxed(c)   __raw_readw(c)
-#define readl_relaxed(c)   __raw_readl(c)
+#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \
+   __raw_readw(c)); __r; })
+#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
+   __raw_readl(c)); __r; })
 
 #define writeb_relaxed(v,c)__raw_writeb(v,c)
-#define writew_relaxed(v,c)__raw_writew(v,c)
-#define writel_relaxed(v,c)__raw_writel(v,c)
+#define writew_relaxed(v,c)__raw_writew((__force u16) cpu_to_le16(v),c)
+#define writel_relaxed(v,c)__raw_writel((__force u32) cpu_to_le32(v),c)
 
 #include 
 
-- 
2.5.0


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] arc: use little endian accesses

2016-03-09 Thread Vineet Gupta
+CC Noam

On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> and le32_to_cpu because it is not really guaranteed that drivers handles
> any ordering themselves.

That is the driver issue. readxx as API simply returns data in native 
endianness.
We've had EZChip running big endian and so far and they didn't need this change.

> For example, serial port driver doesn't work when kernel is build for
> arc big endian architecture.

Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working 
fine.
I presume this is the systemC model for device and standard 8250 driver and very
likely the model is not fixed endian or something.

Alexey knows about this stuff - this was discussed on lkml back in 2013 when he
was fighting the Xilinx systemAce driver endian issues

> Signed-off-by: Lada Trimasova 

Sorry NACK on this ! If you still think we need it I need more data / details on
what exactly is failing in 8250 and how !

-Vineet

> Cc: Alexey Brodkin 
> Cc: Vineet Gupta 
> ---
>  arch/arc/include/asm/io.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arc/include/asm/io.h b/arch/arc/include/asm/io.h
> index 694ece8..0b3d5ea 100644
> --- a/arch/arc/include/asm/io.h
> +++ b/arch/arc/include/asm/io.h
> @@ -129,15 +129,17 @@ static inline void __raw_writel(u32 w, volatile void 
> __iomem *addr)
>  #define writel(v,c)  ({ __iowmb(); writel_relaxed(v,c); })
>  
>  /*
> - * Relaxed API for drivers which can handle any ordering themselves
> + * This are defined to perform little endian accesses
>   */
>  #define readb_relaxed(c) __raw_readb(c)
> -#define readw_relaxed(c) __raw_readw(c)
> -#define readl_relaxed(c) __raw_readl(c)
> +#define readw_relaxed(c) ({ u16 __r = le16_to_cpu((__force __le16) \
> + __raw_readw(c)); __r; })
> +#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
> + __raw_readl(c)); __r; })
>  
>  #define writeb_relaxed(v,c)  __raw_writeb(v,c)
> -#define writew_relaxed(v,c)  __raw_writew(v,c)
> -#define writel_relaxed(v,c)  __raw_writel(v,c)
> +#define writew_relaxed(v,c)  __raw_writew((__force u16) cpu_to_le16(v),c)
> +#define writel_relaxed(v,c)  __raw_writel((__force u32) cpu_to_le32(v),c)
>  
>  #include 
>  


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] arc: use little endian accesses

2016-03-09 Thread Vineet Gupta
On Thursday 10 March 2016 10:35 AM, Vineet Gupta wrote:
> +CC Noam
>
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
>> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
>> > and le32_to_cpu because it is not really guaranteed that drivers handles
>> > any ordering themselves.
> That is the driver issue. readxx as API simply returns data in native 
> endianness.
> We've had EZChip running big endian and so far and they didn't need this 
> change.
>
>> > For example, serial port driver doesn't work when kernel is build for
>> > arc big endian architecture.
> Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working 
> fine.
> I presume this is the systemC model for device and standard 8250 driver and 
> very
> likely the model is not fixed endian or something.
>
> Alexey knows about this stuff - this was discussed on lkml back in 2013 when 
> he
> was fighting the Xilinx systemAce driver endian issues
>

Do you need "native-endian" DT entry in nsimosci DT bindings for uart ?

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] mm: slub: Ensure that slab_unlock() is atomic

2016-03-09 Thread Vineet Gupta
On Wednesday 09 March 2016 08:21 PM, Peter Zijlstra wrote:
>> But in SLUB: bit_spin_lock() + __bit_spin_unlock() is acceptable ? How so
>> (ignoring the performance thing for discussion sake, which is a side effect 
>> of
>> this implementation).
> 
> The sort answer is: Per definition. They are defined to work together,
> which is what makes __clear_bit_unlock() such a special function.
> 
>> So despite the comment below in bit_spinlock.h I don't quite comprehend how 
>> this
>> is allowable. And if say, by deduction, this is fine for LLSC or lock 
>> prefixed
>> cases, then isn't this true in general for lot more cases in kernel, i.e. 
>> pairing
>> atomic lock with non-atomic unlock ? I'm missing something !
> 
> x86 (and others) do in fact use non-atomic instructions for
> spin_unlock(). But as this is all arch specific, we can make these
> assumptions. Its just that generic code cannot rely on it.

OK despite being obvious now, I was not seeing the similarity between 
spin_*lock()
and bit_spin*lock() :-(

ARC also uses standard ST for spin_unlock() so by analogy __bit_spin_unlock() 
(for
LLSC case) would be correctly paired with bit_spin_lock().

But then why would anyone need bit_spin_unlock() at all. Specially after this
patch from you which tightens __bit_spin_lock() even more for the general case.

Thing is if the API exists majority of people would would use the more
conservative version w/o understand all these nuances. Can we pursue the path of
moving bit_spin_unlock() over to __bit_spin_lock(): first changing the backend
only and if proven stable replacing the call-sites themselves.

> 
> So let me try and explain.
> 
> 
> The problem as identified is:
> 
> CPU0  CPU1
> 
> bit_spin_lock()   __bit_spin_unlock()
> 1:
>   /* fetch_or, r1 holds the old value */
>   spin_lock
>   loadr1, addr
>   loadr1, addr
>   bclrr2, r1, 1
>   store   r2, addr
>   or  r2, r1, 1
>   store   r2, addr/* lost the store from CPU1 */
>   spin_unlock
> 
>   and r1, 1
>   bnz 2   /* it was set, go wait */
>   ret
> 
> 2:
>   loadr1, addr
>   and r1, 1
>   bnz 2   /* wait until its not set */
> 
>   b   1   /* try again */
> 
> 
> 
> For LL/SC we replace:
> 
>   spin_lock
>   loadr1, addr
> 
>   ...
> 
>   store   r2, addr
>   spin_unlock
> 
> With the (obvious):
> 
> 1:
>   load-locked r1, addr
> 
>   ...
> 
>   store-cond  r2, addr
>   bnz 1 /* or whatever branch instruction is required to 
> retry */
> 
> 
> In this case the failure cannot happen, because the store from CPU1
> would have invalidated the lock from CPU0 and caused the
> store-cond to fail and retry the loop, observing the new value. 

You did it again, A picture is worth thousand words !

Thx,
-Vineet

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [PATCH] arc: use little endian accesses

2016-03-09 Thread Alexey Brodkin
Hi Vineet,

On Thu, 2016-03-10 at 05:05 +, Vineet Gupta wrote:
> +CC Noam
> 
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> > 
> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> > and le32_to_cpu because it is not really guaranteed that drivers handles
> > any ordering themselves.
> That is the driver issue. readxx as API simply returns data in native 
> endianness.
> We've had EZChip running big endian and so far and they didn't need this 
> change.

Let me disagree with you here.
See what is said in "include/asm-generic/io.h":
-->8-
/*
 * __raw_{read,write}{b,w,l,q}() access memory in native endianness.
 *
 * On some architectures memory mapped IO needs to be accessed differently.
 * On the simple architectures, we just read/write the memory location
 * directly.
 */

...

/*
 * {read,write}{b,w,l,q}() access little endian memory and return result in
 * native endianness.
 */
-->8-

And that's an implementation we have for ARC:
-->8-
#define readb(c)({ u8  __v = readb_relaxed(c); __iormb(); __v; 
})
#define readw(c)({ u16 __v = readw_relaxed(c); __iormb(); __v; 
})
#define readl(c)({ u32 __v = readl_relaxed(c); __iormb(); __v; 
})

#define writeb(v,c) ({ __iowmb(); writeb_relaxed(v,c); })
#define writew(v,c) ({ __iowmb(); writew_relaxed(v,c); })
#define writel(v,c) ({ __iowmb(); writel_relaxed(v,c); })

/*
 * Relaxed API for drivers which can handle any ordering themselves
 */
#define readb_relaxed(c)__raw_readb(c)
#define readw_relaxed(c)__raw_readw(c)
#define readl_relaxed(c)__raw_readl(c)

#define writeb_relaxed(v,c) __raw_writeb(v,c)
#define writew_relaxed(v,c) __raw_writew(v,c)
#define writel_relaxed(v,c) __raw_writel(v,c)
-->8-

Which is effectively (related to endianess discussion):
-->8-
#define readX(c)__raw_readX(c)
#define writeX(v,c) __raw_writeX(v,c)
-->8-

That looks IMHO incorrect if we read API description in 
"include/asm-generic/io.h".
BTW description of {read,write}{b,w,l,q}() is a bit misleading in part saying
"... and return result in __native_endianness__".

But real implementation of {read,write}{b,w,l,q}() in "include/asm-generic/io.h"
really shows what was meant - note __leXX_to_cpu() and cpu_to_leXX are used.

> > 
> > For example, serial port driver doesn't work when kernel is build for
> > arc big endian architecture.
> Last I tested Big Endian on SDP with 8250 part + 8250 driver it was working 
> fine.
> I presume this is the systemC model for device and standard 8250 driver and 
> very
> likely the model is not fixed endian or something.

Model is indeed little-endian. We build it only once and than changing
only "nsim_isa_big_endian" property (which changes only CPU core endianess) may 
use
it equally well with little- and big-endian builds of Linux kernel.

-Alexey
___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc

Re: [PATCH] arc: use little endian accesses

2016-03-09 Thread Arnd Bergmann
On Thursday 10 March 2016, Vineet Gupta wrote:
> On Wednesday 09 March 2016 10:51 PM, Lada Trimasova wrote:
> > Memory access primitives should use cpu_to_le16, cpu_to_le32, le16_to_cpu
> > and le32_to_cpu because it is not really guaranteed that drivers handles
> > any ordering themselves.
> 
> That is the driver issue. readxx as API simply returns data in native 
> endianness.
> We've had EZChip running big endian and so far and they didn't need this 
> change.

Most drivers using readl() or readl_relaxed() expect those to perform byte
swaps on big-endian architectures, as the registers tend to be fixed endian,
so the change seems reasonable.

It depends a little bit on how endian mode is implemented in the CPU: do you
follow the normal model of swapping byte order in the load/store instructions
of the CPU when running big-endian, or is the CPU always running little-endian
but the bus addresses get mangled on byte accesses to give the illusion
of a big-endian system?

Arnd

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc