Re: single copy atomicity for double load/stores on 32-bit systems

2019-06-06 Thread Paul E. McKenney
On Tue, Jun 04, 2019 at 09:41:04AM +0200, Geert Uytterhoeven wrote:
> Hi Paul,
> 
> On Mon, Jun 3, 2019 at 10:14 PM Paul E. McKenney  
> wrote:
> > On Mon, Jun 03, 2019 at 06:08:35PM +, Vineet Gupta wrote:
> > > On 5/31/19 1:21 AM, Peter Zijlstra wrote:
> > > >> I'm not sure how to interpret "natural alignment" for the case of 
> > > >> double
> > > >> load/stores on 32-bit systems where the hardware and ABI allow for 4 
> > > >> byte
> > > >> alignment (ARCv2 LDD/STD, ARM LDRD/STRD )
> > > > Natural alignment: !((uintptr_t)ptr % sizeof(*ptr))
> > > >
> > > > For any u64 type, that would give 8 byte alignment. the problem
> > > > otherwise being that your data spans two lines/pages etc..
> > >
> > > Sure, but as Paul said, if the software doesn't expect them to be atomic 
> > > by
> > > default, they could span 2 hardware lines to keep the implementation 
> > > simpler/sane.
> >
> > I could imagine 8-byte types being only four-byte aligned on 32-bit systems,
> > but it would be quite a surprise on 64-bit systems.
> 
> Or two-byte aligned?
> 
> M68k started with a 16-bit data bus, and alignment rules were retained
> when gaining a wider data bus.
> 
> BTW, do any platforms have issues with atomicity of 4-byte types on
> 16-bit data buses? I believe some embedded ARM or PowerPC do have
> such buses.

But m68k is !SMP-only, correct?  If so, the only issues would be
interactions with interrupt handlers and the like, and doesn't current
m68k hardware use exact interrupts?  Or is it still possible to interrupt
an m68k in the middle of an instruction like it was in the bad old days?

Thanx, Paul

> Gr{oetje,eeting}s,
> 
> Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: single copy atomicity for double load/stores on 32-bit systems

2019-06-06 Thread Geert Uytterhoeven
Hi Paul,

On Thu, Jun 6, 2019 at 11:43 AM Paul E. McKenney  wrote:
> On Tue, Jun 04, 2019 at 09:41:04AM +0200, Geert Uytterhoeven wrote:
> > On Mon, Jun 3, 2019 at 10:14 PM Paul E. McKenney  
> > wrote:
> > > On Mon, Jun 03, 2019 at 06:08:35PM +, Vineet Gupta wrote:
> > > > On 5/31/19 1:21 AM, Peter Zijlstra wrote:
> > > > >> I'm not sure how to interpret "natural alignment" for the case of 
> > > > >> double
> > > > >> load/stores on 32-bit systems where the hardware and ABI allow for 4 
> > > > >> byte
> > > > >> alignment (ARCv2 LDD/STD, ARM LDRD/STRD )
> > > > > Natural alignment: !((uintptr_t)ptr % sizeof(*ptr))
> > > > >
> > > > > For any u64 type, that would give 8 byte alignment. the problem
> > > > > otherwise being that your data spans two lines/pages etc..
> > > >
> > > > Sure, but as Paul said, if the software doesn't expect them to be 
> > > > atomic by
> > > > default, they could span 2 hardware lines to keep the implementation 
> > > > simpler/sane.
> > >
> > > I could imagine 8-byte types being only four-byte aligned on 32-bit 
> > > systems,
> > > but it would be quite a surprise on 64-bit systems.
> >
> > Or two-byte aligned?
> >
> > M68k started with a 16-bit data bus, and alignment rules were retained
> > when gaining a wider data bus.
> >
> > BTW, do any platforms have issues with atomicity of 4-byte types on
> > 16-bit data buses? I believe some embedded ARM or PowerPC do have
> > such buses.
>
> But m68k is !SMP-only, correct?  If so, the only issues would be

M68k support in Linux is uniprocessor-only.

> interactions with interrupt handlers and the like, and doesn't current
> m68k hardware use exact interrupts?  Or is it still possible to interrupt
> an m68k in the middle of an instruction like it was in the bad old days?

TBH, I don't know.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


RE: single copy atomicity for double load/stores on 32-bit systems

2019-06-06 Thread David Laight
From: Paul E. McKenney
> Sent: 06 June 2019 10:44
...
> But m68k is !SMP-only, correct?  If so, the only issues would be
> interactions with interrupt handlers and the like, and doesn't current
> m68k hardware use exact interrupts?  Or is it still possible to interrupt
> an m68k in the middle of an instruction like it was in the bad old days?

Hardware interrupts were always on instruction boundaries, the
mid-instruction interrupts would only happen for page faults (etc).

There were SMP m68k systems (but I can't remember one).
It was important to continue from a mid-instruction trap on the
same cpu - unless you could guarantee that all the cpus had
exactly the same version of the microcode.

In any case you could probably use the 'cmp2' instruction
for an atomic 64bit write.
OTOH setting that up was such a PITA it was always easier
to disable interrupts.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: single copy atomicity for double load/stores on 32-bit systems

2019-06-06 Thread Paul E. McKenney
On Thu, Jun 06, 2019 at 04:34:52PM +, David Laight wrote:
> From: Paul E. McKenney
> > Sent: 06 June 2019 10:44
> ...
> > But m68k is !SMP-only, correct?  If so, the only issues would be
> > interactions with interrupt handlers and the like, and doesn't current
> > m68k hardware use exact interrupts?  Or is it still possible to interrupt
> > an m68k in the middle of an instruction like it was in the bad old days?
> 
> Hardware interrupts were always on instruction boundaries, the
> mid-instruction interrupts would only happen for page faults (etc).

OK, !SMP should be fine, then.

> There were SMP m68k systems (but I can't remember one).
> It was important to continue from a mid-instruction trap on the
> same cpu - unless you could guarantee that all the cpus had
> exactly the same version of the microcode.

Yuck!  ;-)

> In any case you could probably use the 'cmp2' instruction
> for an atomic 64bit write.
> OTOH setting that up was such a PITA it was always easier
> to disable interrupts.

Unless I am forgetting something, given that m68k is a 32-bit system,
we should be OK without an atomic 64-bit write.

Thanx, Paul


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


state of uClibc ARC soft-float support

2019-06-06 Thread Vineet Gupta
Hi Waldemar,

After test-suite commit 9f079b6353 "(disable complex math)" the math tests build
and I see lot of failures (for the default soft float builds)

 test-float-finiteFAIL test-float-finite got 1 expected 0
 test-floatFAIL test-float got 1 expected 0
 test-idoubleFAIL test-idouble got 1 expected 0
 test-ifloatFAIL test-ifloat got 1 expected 0
 test-matherrFAIL test-matherr got 1 expected 0
 test-nan-overflowFAIL test-nan-overflow got 1 expected 0
 test-nan-payloadFAIL test-nan-payload got 1 expected 0


Interestingly in ARC glibc port, soft float builds, all flaot tests pass (so
atleast gcc/libgcc foo seem to be fine I think).

I noticed a few things:

1. ulps for ARC was removed from test-sute last year - so I copied over the
version from ARC glibc port [1]

2. I suppose these don't depend on UCLIBC_HAS_FENV. Anyhow It seems uClibc
__UCLIBC_HAS_FENV__ implies hardware float as it expects all FE_*
exceptions/rounding modes to be defined in ARCH specific file.

Anyhow I tried creating an ARC specific fenv to support soft float with no
exceptions and only single rounding mode but that doesn't seem to help. Any idea
what I'm missing or if it is worth pursuing at all.

Thx,
-Vineet


[1] http://lists.infradead.org/pipermail/linux-snps-arc/2019-January/005347.html


___
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc


Re: [RFC-private] ARC: jump label: implement jump label patching

2019-06-06 Thread Vineet Gupta
On 4/22/19 11:02 AM, Eugeniy Paltsev wrote:
> Implement jump label patching for ARC. Jump labels provide
> an interface to generate dynamic branches using
> self-modifying code.
>
> This allows us to implement conditional branches where
> changing branch direction is expensive but branch selection
> is basically 'free'

I played with this some - stared at generated code - LGTM overall, minor points 
below
For real patch do CC PeterZ and some of the other folsk who have recently touch
linux/jump_label.h

>
> TODO:
>  * Think about interaction with arc_cache_init().
>In current implementation we call flush_icache_range() to
>make instruction we wrote visible to CPU (CPUs).
>So we couldn't switch jump labels in code before
>arc_cache_init() is called for master CPU (as we don't
>configure several cache callbacks yet)

The "switching" of branch (and needed icache flush) can only happen after system
has booted. So this shd not be an issue.

>  * Move instruction generation test to more appropriate place.

Maybe - im not sure either.

>  * Care about jump_table alignment in linker script (is it
>required or not?)

>From looking at the generated linker script, it is already aligned.

|
| . = *ALIGN(8);* __start___jump_table = .; KEEP(*(__jump_table))
__stop___jump_table = .;
|

>
> Signed-off-by: Eugeniy Paltsev 
> ---
>  arch/arc/Kconfig  |   1 +
>  arch/arc/include/asm/jump_label.h |  48 ++
>  arch/arc/kernel/Makefile  |   1 +
>  arch/arc/kernel/jump_label.c  | 102 
> ++
>  4 files changed, 152 insertions(+)
>  create mode 100644 arch/arc/include/asm/jump_label.h
>  create mode 100644 arch/arc/kernel/jump_label.c
>
> diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
> index c781e45d1d99..4b3d33e6aae3 100644
> --- a/arch/arc/Kconfig
> +++ b/arch/arc/Kconfig
> @@ -47,6 +47,7 @@ config ARC
>   select OF_EARLY_FLATTREE
>   select PCI_SYSCALL if PCI
>   select PERF_USE_VMALLOC if ARC_CACHE_VIPT_ALIASING
> + select HAVE_ARCH_JUMP_LABEL if ISA_ARCV2 && !CPU_ENDIAN_BE32
>  
>  config ARCH_HAS_CACHE_LINE_SIZE
>   def_bool y
> diff --git a/arch/arc/include/asm/jump_label.h 
> b/arch/arc/include/asm/jump_label.h
> new file mode 100644
> index ..877b8fcc512c
> --- /dev/null
> +++ b/arch/arc/include/asm/jump_label.h
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_ARC_JUMP_LABEL_H
> +#define _ASM_ARC_JUMP_LABEL_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#include 
> +
> +#define JUMP_LABEL_NOP_SIZE 4
> +
> +static __always_inline bool arch_static_branch(struct static_key *key, bool 
> branch)
> +{
> + asm_volatile_goto("1:\n\t"
> +  "nop \n\t"
> +  ".pushsection __jump_table,  \"aw\"\n\t"
> +  ".word 1b, %l[l_yes], %c0\n\t"
> +  ".popsection\n\t"
> +  : :  "i" (&((char *)key)[branch]) :  : l_yes);
> +
> + return false;
> +l_yes:
> + return true;
> +}
> +
> +static __always_inline bool arch_static_branch_jump(struct static_key *key, 
> bool branch)
> +{
> + asm_volatile_goto("1:\n\t"
> +  "b %l[l_yes]\n\t"
> +  ".pushsection __jump_table,  \"aw\"\n\t"
> +  ".word 1b, %l[l_yes], %c0\n\t"
> +  ".popsection\n\t"
> +  : :  "i" (&((char *)key)[branch]) :  : l_yes);
> +
> + return false;
> +l_yes:
> + return true;
> +}
> +
> +typedef u32 jump_label_t;
> +
> +struct jump_entry {
> + jump_label_t code;
> + jump_label_t target;
> + jump_label_t key;
> +};
> +
> +#endif  /* __ASSEMBLY__ */
> +#endif
> diff --git a/arch/arc/kernel/Makefile b/arch/arc/kernel/Makefile
> index 2dc5f4296d44..307f74156d99 100644
> --- a/arch/arc/kernel/Makefile
> +++ b/arch/arc/kernel/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_ARC_EMUL_UNALIGNED)+= unaligned.o
>  obj-$(CONFIG_KGDB)   += kgdb.o
>  obj-$(CONFIG_ARC_METAWARE_HLINK) += arc_hostlink.o
>  obj-$(CONFIG_PERF_EVENTS)+= perf_event.o
> +obj-$(CONFIG_JUMP_LABEL) += jump_label.o
>  
>  obj-$(CONFIG_ARC_FPU_SAVE_RESTORE)   += fpu.o
>  CFLAGS_fpu.o   += -mdpfp
> diff --git a/arch/arc/kernel/jump_label.c b/arch/arc/kernel/jump_label.c
> new file mode 100644
> index ..7edb713badaf
> --- /dev/null
> +++ b/arch/arc/kernel/jump_label.c
> @@ -0,0 +1,102 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +
> +#include "asm/cache.h"
> +#include "asm/cacheflush.h"
> +
> +static inline u32 arc_gen_nop(void)
> +{
> + /* 1x 32bit NOP in middle endian */
> + return 0x7000264a;
> +}
> +
> +/*
> + * ARCv2 Branch unconditionally instruction:
> + * 0ss1SSNR
> + * s S[n:0] lower bits signed immediate (number is bitfield size)
> + * S S[m:n+1] upper bits signed immediate (number is bitfield size)
> + * t S[24:21] upper bits signed immediate (branch unconditionally far)
> + * N N <.d> delay slot mode
>