[Patch 0/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert

Hi,
  This is V3 of a series of 5 patches relating to ARM atomic operations;
they incorporate most of the feedback from V2.  Note the patch numbering/
ordering is different from v2; the two simple patches are now first.

  1) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
 produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
 as per the docs).
  2) Fix pr48126 which is a misplaced barrier in the atomic generation
  3) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
 and above.
  4) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
 asssist is called (as per 32bit and smaller ops)
  5) Add test cases and support for those test cases, for the operations
 added in (3) and (4).

This code has been tested built on x86-64 cross to ARM run in ARM and Thumb
(C, C++, Fortran).

It is against git rev 68a79dfc.

Relative to v2:
  Test cases split out
  Code sharing between the test cases
  More coding style cleanup
  A handful of NULL->NULL_RTX changes

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave



[Patch 1/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
   gcc/
   * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 993e3a0..f6f1da7 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6 && ! TARGET_HAVE_DMB \
+&& ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)


[Patch 2/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Micahel K. Edwards points out in PR/48126 that the sync is in the wrong 
place
relative to the branch target of the compare, since the load could float
up beyond the ldrex.
  
PR target/48126

  * config/arm/arm.c (arm_output_sync_loop): Move label before barrier

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5161439..6e7105a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
}
 }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+ a barrier to stop subsequent loads floating upwards past the ldrex
+ pr/48126.  */
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx


[Patch 3/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Add support for ARM 64bit sync intrinsics.

gcc/
* arm.c (arm_output_ldrex): Support ldrexd.
  (arm_output_strex): Support strexd.
  (arm_output_it): New helper to output it in Thumb2 mode only.
  (arm_output_sync_loop): Support DI mode,
 Change comment to not support const_int.
  (arm_expand_sync): Support DI mode.

* arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.

* iterators.md (NARROW): move from sync.md.
  (QHSD): New iterator for all current ARM integer modes.
  (SIDI): New iterator for SI and DI modes only.

* sync.md  (sync_predtab): New mode_attr
  (sync_compare_and_swapsi): Fold into sync_compare_and_swap
  (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsi
  (sync_si): Fold into sync_
  (sync_nandsi): Fold into sync_nand
  (sync_new_si): Fold into sync_new_
  (sync_new_nandsi): Fold into sync_new_nand
  (sync_old_si): Fold into sync_old_
  (sync_old_nandsi): Fold into sync_old_nand
  (sync_compare_and_swap): Support SI & DI
  (sync_lock_test_and_set): Likewise
  (sync_): Likewise
  (sync_nand): Likewise
  (sync_new_): Likewise
  (sync_new_nand): Likewise
  (sync_old_): Likewise
  (sync_old_nand): Likewise
  (arm_sync_compare_and_swapsi): Turn into iterator on SI & DI
  (arm_sync_lock_test_and_setsi): Likewise
  (arm_sync_new_si): Likewise
  (arm_sync_new_nandsi): Likewise
  (arm_sync_old_si): Likewise
  (arm_sync_old_nandsi): Likewise
  (arm_sync_compare_and_swap NARROW): use sync_predtab, fix indent
  (arm_sync_lock_test_and_setsi NARROW): Likewise
  (arm_sync_new_ NARROW): Likewise
  (arm_sync_new_nand NARROW): Likewise
  (arm_sync_old_ NARROW): Likewise
  (arm_sync_old_nand NARROW): Likewise

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e7105a..51c0f3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24039,12 +24039,26 @@ arm_output_ldrex (emit_f emit,
  rtx target,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[1] = memory;
+  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (target) & 1) == 0);
+  operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, "ldrexd\t%%0, %%1, %%C2");
+}
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -24057,14 +24071,41 @@ arm_output_strex (emit_f emit,
  rtx value,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
-  cc);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
+ suffix, cc);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+  operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+  operands[3] = memory;
+  arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
+  cc);
+}
+}
+
+/* Helper to emit an it instruction in Thumb2 mode only; although the assembler
+   will ignore it in ARM mode, emitting it will mess up instruction counts we
+   sometimes keep 'flags' are the extra t's and e's if it's more than one
+   instruction that is conditional.  */
+static void
+arm_output_it (emit_f emit, const char *flags, const char *cond)
+{
+  rtx operands[1]; /* Don't actually use the operand.  */
+  if (TARGET_THUMB2)
+arm_output_asm_insn (emit, 0, operands, "it%s\t%s", flags, 

[Patch 4/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
versions but with check for sufficiently new kernel version.

gcc/
* config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
* config/arm/linux-atomic.c: Change comment to point to 64bit version
  (SYNC_LOCK_RELEASE): Instantiate 64bit version.
* config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c


diff --git a/gcc/config/arm/linux-atomic-64bit.c 
b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 000..6966e66
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,166 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilb...@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write (int fd, const void *buf, unsigned int count);
+extern void abort (void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+   const long long* newval,
+   long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
+
+/* Kernel helper page version number.  */
+#define __kernel_helper_version (*(unsigned int *)0x0ffc)
+
+/* Check that the kernel has a new enough version at load.  */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+{
+  const char err[] = "A newer kernel is required to run this binary. "
+   "(__kernel_cmpxchg64 helper)\n";
+  /* At this point we need a way to crash with some information
+for the user - I'm not sure I can rely on much else being
+available at this point, so do the same as generic-morestack.c
+write () and abort ().  */
+  __write (2 /* stderr.  */, err, sizeof (err));
+  abort ();
+}
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void)
+   __attribute__ ((used, section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\
+  {\
+int failure;   \
+long long tmp,tmp2;\
+   \
+do {   \
+  tmp = *ptr;  \
+  tmp2 = PFX_OP (tmp INF_OP val);  \
+  failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr); \
+} while (failure != 0);\
+   \
+return tmp;\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,, |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync__and_fetch and __sync_fetch_and_ for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN   

[Patch 5/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
   Test support for ARM 64bit sync intrinsics.

  gcc/testsuite/
* gcc.dg/di-longlong64-sync-1.c: New test.
* gcc.dg/di-sync-multithread.c: New test.
* gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
* gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
* lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
tests for v5, v6, v6k, and v7-a, and add-options helpers.
  (check_effective_target_arm_arm_ok): New helper.
  (check_effective_target_sync_longlong): New helper.

diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c 
b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 000..82a4ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,164 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" 
"" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" 
"" { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+
+/* Note: This file is #included by some of the ARM tests.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done.  */
+static long long AL[24];
+/* Values copied into AL before we start.  */
+static long long init_di[24] = { 0x10002ll, 0x20003ll, 0, 1,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll};
+/* This is what should be in AL at the end.  */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll };
+
+/* First check they work in terms of what they do to memory.  */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll);
+  __sync_lock_test_and_set (AL+2, 1);
+  __sync_lock_release (AL+3);
+
+  /* The following tests should not change the value since the
+ original does NOT match.  */
+  __sync_val_compare_and_swap (AL+4, 0x2ll, 0x1234567890ll);
+  __sync_val_compare_and_swap (AL+5, 0x1ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+6, 0x2ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+7, 0x1ll, 0x1234567890ll);
+
+  __sync_fetch_and_add (AL+8, 1);
+  __sync_fetch_and_add (AL+9, 0xb000e000ll); /* + to both halves & carry.  
*/
+  __sync_fetch_and_sub (AL+10, 22);
+  __sync_fetch_and_sub (AL+11, 0xb000e000ll);
+
+  __sync_fetch_and_and (AL+12, 0x30007ll);
+  __sync_fetch_and_or (AL+13, 0x50009ll);
+  __sync_fetch_and_xor (AL+14, 0xe0001ll);
+  __sync_fetch_and_nand (AL+15, 0xa0007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+ return value.  */
+  __sync_add_and_fetch (AL+16, 1);
+  /* add to both halves & carry.  */
+  __sync_add_and_fetch (AL+17, 0xb000e000ll);
+  __sync_sub_and_fetch (AL+18, 22);
+  __sync_sub_and_fetch (AL+19, 0xb000e000ll);
+
+  __sync_and_and_fetch (AL+20, 0x30007ll);
+  __sync_or_and_fetch (AL+21, 0x50009ll);
+  __sync_xor_and_fetch (AL+22, 0xe0001ll);
+  __sync_nand_and_fetch (AL+23, 0xa0007ll);
+}
+
+/* Now check return values.  */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll) !=
+   0x10002ll) abort (

[Patch 0/3] ARM 64 bit atomic operations

2011-07-01 Thread Dr. David Alan Gilbert
Hi,
  This is a series of 3 patches relating to ARM atomic operations.

  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k
 and above.
  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
 asssist is called (as per 32bit and smaller ops)
  3) Fix pr48126 which is a misplaced barrier in the atomic generation

Many thanks to Richard Sandiford for pointing me in the right direction
and reviewing it.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

The patch was generated from the gcc git tree from about 2 weeks back
but applies happily on top of the head.

It's been tested cross to ARM from x86 and also a native x86 build & test.

Dave



[Patch 2/3] ARM 64 bit atomic operations

2011-07-01 Thread Dr. David Alan Gilbert

  Provide fallbacks for 64bit atomics that call Linux commpage helpers
  when compiling for older machines.  The code is based on the existing
  linux-atomic.c for other sizes, however it performs an init time
  check that the kernel is new enough to provide the helper.

  This relies on Nicolas Pitre's kernel patch here:
  https://patchwork.kernel.org/patch/894932/

diff --git a/gcc/config/arm/linux-atomic-64bit.c 
b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 000..140cc2f
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,165 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilb...@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+/* For write */
+#include 
+/* For abort */
+#include 
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+   const long long* newval,
+   long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
+
+/* Kernel helper page version number */
+
+#define __kernel_helper_version (*(unsigned int *)0x0ffc)
+
+/* Check that the kernel has a new enough version at load */
+void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+{
+  const char err[] = "A newer kernel is required to run this binary. 
(__kernel_cmpxchg64 helper)\n";
+  /* At this point we need a way to crash with some information
+for the user - I'm not sure I can rely on much else being
+available at this point, so do the same as generic-morestack.c
+write() and abort(). */
+  write (2 /* stderr */, err, sizeof(err));
+  abort ();
+}
+};
+
+void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((section 
(".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\
+  {\
+int failure;   \
+long long tmp,tmp2;\
+   \
+do {   \
+  tmp = *ptr;  \
+  tmp2 = PFX_OP (tmp INF_OP val);  \
+  failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr); \
+} while (failure != 0);\
+   \
+return tmp;\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,, |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync__and_fetch and __sync_fetch_and_ for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_##OP##_and_fetch_8 (long long *ptr, long long val)\
+  {\
+int failur

[Patch 3/3] ARM 64 bit atomic operations

2011-07-01 Thread Dr. David Alan Gilbert

As per pr/48126 Michael Edwards spotted that in the case where
the compare fails in the cmpxchg, the barrier at the end wasn't taken
theoretically allowing a following load to float up above the load
value compared.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 057f9ba..39057d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23531,8 +23626,8 @@ arm_output_sync_loop (emit_f emit,
}
 }
 
-  arm_process_output_memory_barrier (emit, NULL);
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx


[Patch 0/4] ARM 64 bit sync atomic operations [V2]

2011-07-26 Thread Dr. David Alan Gilbert
Hi,
  This is V2 of a series of 4 patches relating to ARM atomic operations;
they incorporate most of the feedback from V1 - thanks Ramana, Richard and
Joseph for comments.

  1) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
 and above.
  2) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
 asssist is called (as per 32bit and smaller ops)
  3) Fix pr48126 which is a misplaced barrier in the atomic generation
  4) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
 produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
 as per the docs).

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This code was tested with a full bootstrap on ARM; make check results
are the same as without the patches except for extra passes due to the new
tests.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave



[Patch 2/4] ARM 64 bit sync atomic operations [V2]

2011-07-26 Thread Dr. David Alan Gilbert

Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
versions but with check for sufficiently new kernel version.

gcc/
* config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
* config/arm/linux-atomic.c: Change comment to point to 64bit version
  (SYNC_LOCK_RELEASE): Instantiate 64bit version.
* config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c

diff --git a/gcc/config/arm/linux-atomic-64bit.c 
b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 000..8b65de8
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,162 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilb...@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write(int fd, const void *buf, unsigned int count);
+extern void abort(void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+   const long long* newval,
+   long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
+
+/* Kernel helper page version number */
+#define __kernel_helper_version (*(unsigned int *)0x0ffc)
+
+/* Check that the kernel has a new enough version at load */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+{
+  const char err[] = "A newer kernel is required to run this binary. 
(__kernel_cmpxchg64 helper)\n";
+  /* At this point we need a way to crash with some information
+for the user - I'm not sure I can rely on much else being
+available at this point, so do the same as generic-morestack.c
+write() and abort(). */
+  __write (2 /* stderr */, err, sizeof(err));
+  abort ();
+}
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void) __attribute__ ((used, 
section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\
+  {\
+int failure;   \
+long long tmp,tmp2;\
+   \
+do {   \
+  tmp = *ptr;  \
+  tmp2 = PFX_OP (tmp INF_OP val);  \
+  failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr); \
+} while (failure != 0);\
+   \
+return tmp;\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,, |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync__and_fetch and __sync_fetch_and_ for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_##OP##_and_fetc

[Patch 3/4] ARM 64 bit sync atomic operations [V2]

2011-07-26 Thread Dr. David Alan Gilbert
  Micahel K. Edwards points out in PR/48126 that the sync is in the wrong 
place
  relative to the branch target of the compare, since the load could float
  up beyond the ldrex.

gcc/
* config/arm/arm.c (arm_output_sync_loop): Move label before barier,
   fixes PR/48126

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 28be078..cee3471 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23780,8 +23780,11 @@ arm_output_sync_loop (emit_f emit,
}
 }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+ a barrier to stop subsequent loads floating upwards past the ldrex
+ pr/48126 */
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx


[Patch 4/4] ARM 64 bit sync atomic operations [V2]

2011-07-26 Thread Dr. David Alan Gilbert

gcc/
* config/arm/arm.c (TARGET_HAVE_DMB_MCR) MCR Not available in Thumb1
  but is available on armv6

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 0d419d5..146b9ad 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -285,7 +285,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6 && ! TARGET_HAVE_DMB \
+&& ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)