cxx-mem-model merge [3 of 9] doc

Andrew MacLeod Thu, 03 Nov 2011 16:50:47 -0700
These are the documentation changes for the __atomic builtins.Changefile entries are part of the gcc patch.
Index: doc/extend.texi
===================================================================
*** doc/extend.texi     (.../trunk/gcc) (revision 180790)
--- doc/extend.texi     (.../branches/cxx-mem-model/gcc)        (revision 
180839)
*************** This builtin is not a full barrier, but 
*** 6813,6818 ****
--- 6813,7023 ----
  This means that all previous memory stores are globally visible, and all
  previous memory loads have been satisfied, but following memory reads
  are not prevented from being speculated to before the barrier.
+ 
+ @section Built-in functions for memory model aware atomic operations.
+ 
+ The following builtins approximately match the requirements for
+ C++11 memory model. Many are similar to the ``__sync'' prefixed builtins, but
+ all also have a memory model parameter.  These are all identified by being
+ prefixed with ``__atomic'', and most are overloaded such that they work
+ with multiple types.
+ 
+ GCC will allow any integral scalar or pointer type that is 1, 2, 4, or 8 bytes
+ in length. 16 bytes integral types are also allowed if __int128_t is supported
+ by the architecture.
+ 
+ Target architectures are encouraged to provide their own patterns for each of
+ these builtins.  If no target is provided, the original non-memory model
+ set of ``__sync'' atomic builtins will be utilized, along with any required
+ synchronization fences surrounding it in order to achieve the proper 
behaviour.
+ Execution in this case is subject to the same restrictions as those builtins.
+ 
+ If there is no pattern or mechanism to provide a lock free instruction 
sequence, a call is made to an external routine with the same parameters to be 
resolved at runtime.
+ 
+ The four non-arithmetic functions (load, store, exchange, and 
compare_exchange) all have a generic version as well.  This generic version 
will work on any data type.  If the data type size maps to one of the integral 
sizes which may have lock free support, the generic version will utilize the 
lock free builtin.  Otherwise an external call is left to be resolved at 
runtime.  This external call will be the same format with the addition of a 
size_t parameter inserted as the first parameter indicating the size of the 
object being pointed to.  All objects must be the same size.
+ 
+ There are 6 different memory models which can be specified.  These map to the
+ same names in the C++11 standard.  Refer there or to the GCC wiki on atomics 
for
+ more detailed definitions.  These memory models integrate both barriers to
+ code motion as well as synchronization requirements with other threads. These
+ are listed in approximately ascending order of strength.
+ 
+ @table  @code
+ @item __ATOMIC_RELAXED
+ No barriers or synchronization.
+ @item __ATOMIC_CONSUME
+ Data dependency only for both barrier and synchronization with another thread.
+ @item __ATOMIC_ACQUIRE
+ Barrier to hoisting of code and synchronizes with release (or stronger)
+ semantic stores from another thread.
+ @item __ATOMIC_RELEASE
+ Barrier to sinking of code and synchronizes with acquire (or stronger) 
semantic
+ loads from another thread.
+ @item __ATOMIC_ACQ_REL
+ Full barrier in both directions and synchronizes with acquire loads and 
release
+ stores in another thread.
+ @item __ATOMIC_SEQ_CST
+ Full barrier in both directions and synchronizes with acquire loads and 
release
+ stores in all threads.
+ @end table
+ 
+ When implementing patterns for these builtins, the memory model parameter can
+ be ignored as long as the pattern implements the most restrictive 
+ __ATOMIC_SEQ_CST model.  Any of the other memory models will execute 
+ correctly with this memory model but they may not execute as efficiently as 
+ they could with a more appropriate implemention of the relaxed requirements.
+ 
+ Note that the C++11 standard allows for the memory model parameter to be
+ determined at runtime rather than at compile time.  These builtins will map 
any 
+ runtime value to __ATOMIC_SEQ_CST rather than invoke a runtime library 
+ call or inline a switch statement.  This is standard compliant, safe, and the 
+ simplest approach for now.
+ 
+ @item @var{type} __atomic_load_n (@var{type} *ptr, int memmodel)
+ @findex __atomic_load_n
+ This builtin implements an atomic load operation.  It returns the contents
+ of @code{*@var{ptr}}.
+ 
+ The valid memory model variants are
+ __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE, and
+ __ATOMIC_CONSUME.
+ 
+ @item void __atomic_load (@var{type} *ptr, @var{type} *ret, int memmodel)
+ @findex __atomic_load
+ This is the generic version of an atomic load.  It will return the contents
+ of @code{*@var{ptr}} in @code{*@var{ret}}.
+ 
+ 
+ @item void __atomic_store_n (@var{type} *ptr, @var{type} val, int memmodel)
+ @findex __atomic_store_n
+ This builtin implements an atomic store operation.  It writes @code{@var{val}}
+ into @code{*@var{ptr}}.  On targets which are limited, 0 may be the only valid
+ value. This mimics the behaviour of __sync_lock_release on such hardware.
+ 
+ The valid memory model variants are
+ __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, and __ATOMIC_RELEASE.
+ 
+ @item void __atomic_store (@var{type} *ptr, @var{type} *val, int memmodel)
+ @findex __atomic_store
+ This is the generic version of an atomic store.  It will store the value of
+ @code{*@var{val}} into @code{*@var{ptr}}.
+ 
+ 
+ @item @var{type} __atomic_exchange_n (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @findex __atomic_exchange_n
+ This builtin implements an atomic exchange operation.  It writes @var{val}
+ into @code{*@var{ptr}}, and returns the previous contents of 
@code{*@var{ptr}}.
+ 
+ On targets which are limited, a value of 1 may be the only valid value 
written.
+ This mimics the behaviour of __sync_lock_test_and_set on such hardware.
+ 
+ The valid memory model variants are
+ __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, __ATOMIC_ACQUIRE,
+ __ATOMIC_RELEASE, and __ATOMIC_ACQ_REL.
+ 
+ @item void __atomic_exchange (@var{type} *ptr, @var{type} *val, @var{type} 
*ret, int memmodel)
+ @findex __atomic_exchange
+ This is the generic version of an atomic exchange.  It will store the contents
+ of @code{*@var{val}} into @code{*@var{ptr}}. The original value of
+ @code{*@var{ptr}} will be copied into @code{*@var{ret}}.
+ 
+ 
+ @item bool __atomic_compare_exchange_n (@var{type} *ptr, @var{type} 
*expected, @var{type} desired, bool weak, int success_memmodel, int 
failure_memmodel)
+ @findex __atomic_compare_exchange_n
+ This builtin implements an atomic compare_exchange operation.  This compares 
the
+ contents of @code{*@var{ptr}} with the contents of @code{*@var{expected}} and 
if
+ equal, writes @var{desired} into @code{*@var{ptr}}.  If they are not equal, 
the
+ current contents of @code{*@var{ptr}} is written into @code{*@var{expected}}.
+ 
+ True is returned if @code{*@var{desired}} is written into @code{*@var{ptr}} 
and
+ the execution is considered to conform to the memory model specified by
+ @var{success_memmodel}.  There are no restrictions on what memory model can be
+ used here.
+ 
+ False is returned otherwise, and the execution is considered to conform to
+ @var{failure_memmodel}. This memory model cannot be __ATOMIC_RELEASE nor
+ __ATOMIC_ACQ_REL.  It also cannot be a stronger model than that specified
+ by @var{success_memmodel}.
+ 
+ @item bool __atomic_compare_exchange (@var{type} *ptr, @var{type} *expected, 
@var{type} *desired, bool weak, int success_memmodel, int failure_memmodel)
+ @findex __atomic_compare_exchange
+ This builtin implements the generic version of __atomic_compare_exchange. The 
function is virtually identical to  __atomic_compare_exchange_n, except the 
desired value is also a pointer.
+ 
+ 
+ @item @var{type} __atomic_add_fetch (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_sub_fetch (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_and_fetch (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_xor_fetch (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_or_fetch (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @findex __atomic_add_fetch
+ @findex __atomic_sub_fetch
+ @findex __atomic_and_fetch
+ @findex __atomic_xor_fetch
+ @findex __atomic_or_fetch
+ These builtins perform the operation suggested by the name, and return the 
result 
+ of the operation. That is,
+ 
+ @smallexample
+ @{ *ptr @var{op}= val; return *ptr; @}
+ @end smallexample
+ 
+ All memory models are valid.
+ 
+ @item @var{type} __atomic_fetch_add (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_fetch_sub (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_fetch_and (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_fetch_xor (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @itemx @var{type} __atomic_fetch_or (@var{type} *ptr, @var{type} val, int 
memmodel)
+ @findex __atomic_fetch_add
+ @findex __atomic_fetch_sub
+ @findex __atomic_fetch_and
+ @findex __atomic_fetch_xor
+ @findex __atomic_fetch_or
+ These builtins perform the operation suggested by the name, and return the 
value
+ that had previously been in *ptr .  That is,
+ 
+ @smallexample
+ @{ tmp = *ptr; *ptr @var{op}= val; return tmp; @}
+ @end smallexample
+ 
+ All memory models are valid.
+ 
+ @item void __atomic_thread_fence (int memmodel)
+ @findex __atomic_thread_fence
+ 
+ This builtin acts as a synchronization fence between threads based on the
+ specified memory model.
+ 
+ All memory orders are valid.
+ 
+ @item void __atomic_signal_fence (int memmodel)
+ @findex __atomic_signal_fence
+ 
+ This builtin acts as a synchronization fence between a thread and signal
+ handlers based in the same thread.
+ 
+ All memory orders are valid.
+ 
+ @item bool __atomic_always_lock_free (size_t size)
+ @findex __atomic_always_lock_free
+ 
+ This builtin returns true if objects of size bytes will always generate lock
+ free atomic instructions for the target architecture.  Otherwise false is
+ returned.
+ 
+ size must resolve to a compile time constant.
+ 
+ @smallexample
+ if (_atomic_always_lock_free (sizeof (long long)))
+ @end smallexample
+ 
+ @item bool __atomic_is_lock_free (size_t size)
+ @findex __atomic_is_lock_free
+ 
+ This builtin returns true if objects of size bytes will always generate lock
+ free atomic instructions for the target architecture.  If it is not known to
+ be lock free a call is made to a runtime routine named __atomic_is_lock_free.
+ 
  @end table
  
  @node Object Size Checking
Index: doc/invoke.texi
===================================================================
*** doc/invoke.texi     (.../trunk/gcc) (revision 180790)
--- doc/invoke.texi     (.../branches/cxx-mem-model/gcc)        (revision 
180839)
*************** The maximum number of conditional stores
*** 9149,9159 ****
--- 9149,9174 ----
  if either vectorization (@option{-ftree-vectorize}) or if-conversion
  (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
  
+ @item allow-load-data-races
+ Allow optimizers to introduce new data races on loads.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
  @item allow-store-data-races
  Allow optimizers to introduce new data races on stores.
  Set to 1 to allow, otherwise to 0.  This option is enabled by default
  unless implicitly set by the @option{-fmemory-model=} option.
  
+ @item allow-packed-load-data-races
+ Allow optimizers to introduce new data races on packed data loads.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
+ @item allow-packed-store-data-races
+ Allow optimizers to introduce new data races on packed data stores.
+ Set to 1 to allow, otherwise to 0.  This option is enabled by default
+ unless implicitly set by the @option{-fmemory-model=} option.
+ 
  @item case-values-threshold
  The smallest number of different values for which it is best to use a
  jump-table instead of a tree of conditional branches.  If the value is
cxx-mem-model merge [3 of 9] doc

Reply via email to