66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.

Ramana Radhakrishnan Fri, 29 May 2015 06:19:30 -0700


On 22/05/15 18:36, Jason Merrill wrote:

On 05/22/2015 11:23 AM, Ramana Radhakrishnan wrote:

On 22/05/15 15:28, Jason Merrill wrote:

I do notice that get_guard_bits after build_atomic_load just won't work
on non-ARM targets, as it ends up trying to take the address of a value.


So on powerpc where targetm.guard_mask_bit is false - this is what I see.

   &(long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 2)


This is the bit that doesn't make sense to me.  __atomic_load_8 returns
a value; what does it mean to take its address?  If we're going to load
more than just the first byte, we should mask off the rest rather than
trying to mess with its address.

It also seems unnecessary to load 8 bytes on any target; we could add a
function to optabs.c that returns the smallest mode for which there's
atomic load support?

Jason

Right, taken all comments on-board - thanks to those who tested theoriginal patch, here's v2 of the patch currently bootstrapped and testedonly on aarch64-none-linux-gnu


This patch -

- Turns build_atomic_load into build_atomic_load_byte in order
  to do an atomic load of 1 byte instead of a full word atomic load.
- Restructures get_guard_cond to decide whether to use an atomic load
  or a standard load depending on whether code generated will be in
  a multi-threaded context or not.
- Adjusts all callers of get_guard_cond accordingly.

One of the bits of fallout that I've observed in my testing and that I'mnot sure about what to do is that on *bare-metal* arm-none-eabi targetswe still put out calls to __sync_synchronize on architecture versionsthat do not have a barrier instruction which will result in a link error.

While it is tempting to take the easy approach of not putting out thecall, I suspect in practice a number of users of the bare-metal toolsuse these for their own RTOS's and other micro-OS's. Thus generatingbarriers at higher architecture levels and not generating barriers atlower architecture levels appears to be a bit dangerous especially onarchitectures where there is backwards compatibility (i.e.-mcpu=arm7tdmi on standard user code is still expected to generate codethat works on a core that conforms to a later architecture revision).

I am considering leaving this in the ARM backend to force people tothink what they want to do about thread safety with statics and C++ onbare-metal systems. If they really do not want thread safety they canwell add -fno-threadsafe-statics or provide an appropriateimplementation for __sync_synchronize on their platforms.


Any thoughts / comments ?

regards
Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrish...@arm.com>

    PR c++/66192
    * cp-tree.h (get_guard_cond): Adjust declaration
    * decl.c (expand_static_init): Use atomic load acquire
     and adjust call to get_guard_cond.
    * decl2.c (build_atomic_load_byte): New function.
    (get_guard_cond): Handle thread_safety.
    (one_static_initialization_or_destruction): Adjust call to
    get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrish...@arm.com>

    PR c++/66192
    * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
    * doc/tm.texi: Regenerate.
    * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
    * target.def (TARGET_RELAXED_ORDERING): Delete.

commit e9c757b843b22609ac161e18fc7cd17e8866562d
Author: Ramana Radhakrishnan <ramana.radhakrish...@arm.com>
Date:   Fri May 29 10:55:18 2015 +0100

     Patch v2 - Remove TARGET_RELAXED_ORDERING.
    
    This respun version handles comments from the previous revision.
    
    - Simplifies build_atomic_load into build_atomic_load_byte in order
    to do an atomic load of 1 byte instead of a full word atomic load.
    - Restructures get_guard_cond to decide whether to use an atomic load
    or a standard load depending on whether code generated will be in
    a multi-threaded context or not.
    - Adjusts all callers of get_guard_cond accordingly.
    
    Is this better now ?
    
    Bootstrapped and regression tested on aarch64-none-linux-gnu
    
    regards
    Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrish...@arm.com>
    PR c++/66192
    * cp-tree.h (get_guard_cond): Adjust declaration
    * decl.c (expand_static_init): Use atomic load acquire
     and adjust call to get_guard_cond.
    * decl2.c (build_atomic_load_byte): New function.
    (get_guard_cond): Handle thread_safety.
    (one_static_initialization_or_destruction): Adjust call to
    get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrish...@arm.com>

  PR c++/66192
    * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
    * doc/tm.texi: Regenerate.
    * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
    * target.def (TARGET_RELAXED_ORDERING): Delete.


diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 1ba99d0..857c9ac 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree 
*clear, tree *update)
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
 
-/* The Alpha architecture does not require sequential consistency.  See
-   http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
-   for an example of how it can be violated in practice.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE alpha_option_override
 
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c1e2ecd..45ad97a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] =
 #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
   ia64_libgcc_floating_mode_supported_p
 
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
-   in an order different from the specified program order.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
 #undef TARGET_LEGITIMATE_ADDRESS_P
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a590ef4..ce70ca0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1620,17 +1620,6 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
-   The PowerPC architecture requires only weak consistency among
-   processors--that is, memory accesses between processors need not be
-   sequentially consistent and memory accesses among processors can occur
-   in any order. The ability to order memory accesses weakly provides
-   opportunities for more efficient use of the system bus. Unless a
-   dependency exists, the 604e allows read operations to precede store
-   operations.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #ifdef HAVE_AS_TLS
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 17e1e86..29763c4 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -139,12 +139,6 @@ do {                                                       
                \
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 43da848..efa33fb 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -253,12 +253,6 @@ do {                                                       
                \
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index a1562ad..094287f 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8];
 #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
 #endif
 
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sparc_option_override
 
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 106d993..72dd18b 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel;
 
 #define SPARC_DEFAULT_CMODEL CM_32
 
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
-   which requires the following macro to be true if enabled.  Prior to V9,
-   there are no instructions to even talk about memory synchronization.
-   Note that the UltraSPARC III processors don't implement RMO, unlike the
-   UltraSPARC II processors.  Niagara, Niagara-2, and Niagara-3 do not
-   implement RMO either.
-
-   Default to false; for example, Solaris never enables RMO, only ever uses
-   total memory ordering (TMO).  */
-#define SPARC_RELAXED_ORDERING false
-
 /* Do not use the .note.GNU-stack convention by default.  */
 #define NEED_INDICATE_EXEC_STACK 0
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 91619e2..aa391c8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5490,7 +5490,7 @@ extern bool mark_used                             (tree, 
tsubst_flags_t);
 extern void finish_static_data_member_decl     (tree, tree, bool, tree, int);
 extern tree cp_build_parm_decl                 (tree, tree);
 extern tree get_guard                          (tree);
-extern tree get_guard_cond                     (tree);
+extern tree get_guard_cond                     (tree, bool);
 extern tree set_guard                          (tree);
 extern tree get_tls_wrapper_fn                 (tree);
 extern void mark_needed                                (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a8cb358..8f92667 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7211,7 +7211,7 @@ expand_static_init (tree decl, tree init)
         looks like:
 
           static <type> guard;
-          if (!guard.first_byte) {
+          if (!__atomic_load (guard.first_byte)) {
             if (__cxa_guard_acquire (&guard)) {
               bool flag = false;
               try {
@@ -7241,16 +7241,11 @@ expand_static_init (tree decl, tree init)
       /* Create the guard variable.  */
       guard = get_guard (decl);
 
-      /* This optimization isn't safe on targets with relaxed memory
-        consistency.  On such targets we force synchronization in
-        __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !thread_guard)
-       {
-         /* Begin the conditional initialization.  */
-         if_stmt = begin_if_stmt ();
-         finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
-         then_clause = begin_compound_stmt (BCS_NO_SCOPE);
-       }
+      /* Begin the conditional initialization.  */
+      if_stmt = begin_if_stmt ();
+
+      finish_if_stmt_cond (get_guard_cond (guard, thread_guard), if_stmt);
+      then_clause = begin_compound_stmt (BCS_NO_SCOPE);
 
       if (thread_guard)
        {
@@ -7319,12 +7314,9 @@ expand_static_init (tree decl, tree init)
          finish_if_stmt (inner_if_stmt);
        }
 
-      if (!targetm.relaxed_ordering || !thread_guard)
-       {
-         finish_compound_stmt (then_clause);
-         finish_then_clause (if_stmt);
-         finish_if_stmt (if_stmt);
-       }
+      finish_compound_stmt (then_clause);
+      finish_then_clause (if_stmt);
+      finish_if_stmt (if_stmt);
     }
   else if (DECL_THREAD_LOCAL_P (decl))
     tls_aggregates = tree_cons (init, decl, tls_aggregates);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f1b3d0c..36aa8b4 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3034,6 +3034,25 @@ get_guard (tree decl)
   return guard;
 }
 
+static tree
+build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+{
+  tree ptr_type = build_pointer_type (char_type_node);
+  tree mem_model = build_int_cst (integer_type_node, model);
+  tree t, addr, val;
+  unsigned int size;
+  int fncode;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+
+  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+  t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+  addr = build1 (ADDR_EXPR, ptr_type, src);
+  val = build_call_expr (t, 2, addr, mem_model);
+  return val;
+}
+
 /* Return those bits of the GUARD variable that should be set when the
    guarded entity is actually initialized.  */
 
@@ -3060,12 +3079,14 @@ get_guard_bits (tree guard)
    variable has already been initialized.  */
 
 tree
-get_guard_cond (tree guard)
+get_guard_cond (tree guard, bool thread_safe)
 {
   tree guard_value;
 
-  /* Check to see if the GUARD is zero.  */
-  guard = get_guard_bits (guard);
+  if (!thread_safe)
+    guard = get_guard_bits (guard);
+  else
+    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
@@ -3681,7 +3702,7 @@ one_static_initialization_or_destruction (tree decl, tree 
init, bool initp)
          /* When using __cxa_atexit, we never try to destroy
             anything from a static destructor.  */
          gcc_assert (initp);
-         guard_cond = get_guard_cond (guard);
+         guard_cond = get_guard_cond (guard, false);
        }
       /* If we don't have __cxa_atexit, then we will be running
         destructors from .fini sections, or their equivalents.  So,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f2f3497..a16cd92 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11395,16 +11395,6 @@ routine for target specific customizations of the 
system printf
 and scanf formatter settings.
 @end defmac
 
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used.  This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64.  The default is @code{false}.
-@end deftypevr
-
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN 
(const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
 If defined, this macro returns the diagnostic message when it is
 illegal to pass argument @var{val} to function @var{funcdecl}
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 35b02b7..93fb41c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8143,8 +8143,6 @@ routine for target specific customizations of the system 
printf
 and scanf formatter settings.
 @end defmac
 
-@hook TARGET_RELAXED_ORDERING
-
 @hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 
 @hook TARGET_INVALID_CONVERSION
diff --git a/gcc/target.def b/gcc/target.def
index f2cb81d..b606b81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5785,19 +5785,6 @@ for the primary source file, immediately after 
printing\n\
 this to be done.  The default is false.",
  bool, false)
 
-/* True if the target is allowed to reorder memory accesses unless
-   synchronization is explicitly requested.  */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used.  This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64.  The default is @code{false}.",
- bool, false)
-
 /* Returns true if we should generate exception tables for use with the
    ARM EABI.  The effects the encoding of function exception specifications.  
*/
 DEFHOOKPOD

Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.

Reply via email to