[PATCH 05/11] c: c++: Expand into CAS loop in frontend

mmalcomson Thu, 14 Nov 2024 06:04:35 -0800

From: Matthew Malcomson <mmalcom...@nvidia.com>

Use wrapper in builtins.cc to check if required.  N.b. if there are any
floating point types that are available but not as arithmetic types or
simply not available, that should be handled by the sync_resolve_size
check against available types.  We add an assertion in the CAS loop
expansion on this requirement.


Note that when deciding whether to emit a CAS loop the C/C++ frontends
check for if *any* FP fetch_add related optabs are available on this
target.  If checking whether the specific operation we're expanding is
available for the given mode that would mean that we would miss the
possible transformation of fetch_add to fetch_sub with a negative value,
and similar for the transformation fetch_add to add_fetch with an
adjustment on the result value.

These transformations allow the target backend to only specify one
pattern and have the compiler use it for a set of options.  They are
performed in the expand phase.

We avoid expanding to a CAS loop when -fno-inline-atomics is passed and
in that case emit a call to the resolved version of
`__atomic_fetch_add_fp*` or similar.

------------------------------
The new floating point atomic builtins take floating point operands.
Those operands could be EXCESS_PRECISION_EXPR tree nodes.

In our CAS loop expansion for fetch_add/fetch_sub we convert the value
being added/subtracted from the atomic data into the relevant type for
this function before using it.  We also fully fold this expression for
simplification at this point before emitting the remaining code.

This existing code does not handle EXCESS_PRECISION_EXPR's since
c_fully_fold can not handle such codes except as the outermost
expression.
Since this expansion is semantically at the point of a function
boundary (an argument being passed to a builtin), we want to expand a
CAS loop on the semantic value.  Hence in this CAS loop expansion,
when the value being added/subtracted is an EXCESS_PRECISION_EXPR we
adjust the semantic value in that expression to the semantic value of
the argument and pass that into `c_fully_fold`.

------------------------------
libstdc++ atomic class requires padding to be cleared to satisfy
the standard (P0528).  When implementing fetch_add as a builtin, we need
to maintain this property.

For most floating point types this is not a problem, but long double on
x86 does have to watch out for this.

When the builtin expands to a CAS loop, it's pretty awkward to implement
the padding clear in libstdc++.  Would have to essentially perform the
same CAS loop again in order to clear padding.

Hence it seems most reasonable to clear padding in the builtin expansion
itself.  That does mean that all code performing a fetch_add on a
long double on x86 would have this extra operation to clear padding
whether needed or not.  I believe that seems reasonable.
For other types this expands into a NOP very early on.

------------------------------
The SFINAE handling in the CAS loop needed some updating for floating
point types.  In this area we need to:
1) Remove the assertion that we never expand a CAS loop in the frontend
   when `complain` is unset.  This assertion was there because _BitInt
   is not available in C++ and the CAS loop was only there for C++.
   Now that we sometimes expand the CAS loop for floating point types we
   can indeed do this in the C++ frontend.  Hence this assertion no
   longer holds.
2) Thread the `complain` parameter through the CAS loop expansion in the
   C/C++ frontend.
3) Account for the new possibility that `convert` would raise an error
   (when attempting to convert from a pointer to a floating point type).
   This was done with an extra check in both `sync_resolve_params` and
   `atomic_alttyped_fetch_using_cas_loop` in a similar way to other
   checks on types in functions called by `resolve_overloaded_builtin`
   because threading the `complain` argument through the `convert`
   function interface prooved to be too invasive (do not know of an
   interface that performs the conversion while fully handling SFINAE
   contexts e.g. `ocp_convert` uses `convert_to_real_maybe_fold`).
4) Update tests to check that we provide the relevant too few/many
   arguments error accordingly when __atomic_add_fetch is incorrectly
   called.  Also check that this error is not emitted when in an SFINAE
   context.
   Similar for incorrectly passing a pointer to the floating point
   fetch_add function.

gcc/ChangeLog:

        * builtins.cc (atomic_fp_fetch_add_implemented): New.
        * builtins.h (atomic_fp_fetch_add_implemented): New decl.

gcc/c-family/ChangeLog:

        * c-common.cc  (sync_resolve_params): Check for cast from
        pointer to floating point type and error accordingly.
        (atomic_bitint_fetch_using_cas_loop): Renamed to ...
        (atomic_alttyped_fetch_using_cas_loop): .. this, and updated to
        handle floating point types.  Also add new argument `complain`
        indicating whether this function should emit errors on type
        problems.
        (resolve_overloaded_builtin): Expand to CAS loop if floating
        point fetch_op optab not available.  Pass `complain` in call to
        `atomic_alttyped_fetch_using_cas_loop`.

gcc/testsuite/ChangeLog:

        * gcc.target/i386/excess-precision-13.c: New test.
        * g++.dg/template/builtin-atomic-overloads.def: Update to handle
        new behaviour that fetch_add function calls can handle floating
        point types.
        * g++.dg/template/builtin-atomic-overloads6.C: Introduce new
        tests against floating point atomic fetch_add.
        * g++.dg/template/builtin-atomic-overloads7.C: Likewise.
        * gcc.dg/atomic-op-fp-convert.c: New test.
        * gcc.dg/atomic-op-fp-resolve-complain.c: New test.

Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
---
 gcc/builtins.cc                               |  24 +++
 gcc/builtins.h                                |   1 +
 gcc/c-family/c-common.cc                      | 179 +++++++++++++++---
 .../template/builtin-atomic-overloads.def     |  28 +--
 .../template/builtin-atomic-overloads6.C      |  23 ++-
 .../template/builtin-atomic-overloads7.C      |  16 +-
 gcc/testsuite/gcc.dg/atomic-op-fp-convert.c   |   6 +
 .../gcc.dg/atomic-op-fp-resolve-complain.c    |   5 +
 .../gcc.target/i386/excess-precision-13.c     |  88 +++++++++
 9 files changed, 323 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp-convert.c
 create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp-resolve-complain.c
 create mode 100644 gcc/testsuite/gcc.target/i386/excess-precision-13.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 480d38db058..3585c5c9af9 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6931,6 +6931,30 @@ expand_builtin_atomic_store (machine_mode mode, tree exp)
   return expand_atomic_store (mem, val, model, false);
 }
 
+/* Returns whether the backend implements any
+   fetch_add/add_fetch/sub_fetch/fetch_sub opcode on the given mode.
+   If it does then the optab functionality in expand_atomic_fetch_op will be
+   able to avoid emitting a CAS loop.  That means that we have no concern about
+   floating point exception information  things and hence the C/C++ frontends
+   will not have to emit the CAS loop directly themselves.  */
+bool
+atomic_fp_fetch_add_implemented (machine_mode mode)
+{
+  direct_optab optabs[4] = {
+    atomic_fetch_add_optab,
+    atomic_fetch_sub_optab,
+    atomic_sub_fetch_optab,
+    atomic_add_fetch_optab,
+  };
+  for (int i = 0; i < 4; i++)
+    {
+      insn_code icode = direct_optab_handler (optabs[i], mode);
+      if (icode != CODE_FOR_nothing)
+       return true;
+    }
+  return false;
+}
+
 /* Expand the __atomic_fetch_XXX intrinsic:
        TYPE __atomic_fetch_XXX (TYPE *object, TYPE val, enum memmodel)
    EXP is the CALL_EXPR.
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 7ac9981442d..0833d3c615a 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -130,6 +130,7 @@ extern tree std_fn_abi_va_list (tree);
 extern tree std_canonical_va_list_type (tree);
 extern void std_expand_builtin_va_start (tree, rtx);
 extern void expand_builtin_trap (void);
+extern bool atomic_fp_fetch_add_implemented (machine_mode);
 extern int get_builtin_fp_offset (tree);
 extern void expand_ifn_atomic_bit_test_and (gcall *);
 extern void expand_ifn_atomic_compare_exchange (gcall *);
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 5dc7fc10db3..17bdcbfedc3 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -7617,13 +7617,33 @@ sync_resolve_params (location_t loc, tree 
orig_function, tree function,
         floating point type with new format sync routines, i.e. don't attempt
         to convert pointer arguments (e.g. EXPECTED argument of
         __atomic_compare_exchange_n), bool arguments (e.g. WEAK argument) or
-        signed int arguments (memmodel kinds).  */
-      if ((TREE_CODE (arg_type) == INTEGER_TYPE && TYPE_UNSIGNED (arg_type))
-         || SCALAR_FLOAT_TYPE_P (arg_type))
+        signed int arguments (memmodel kinds).
+
+        N.b. manually check conversion from pointer to floating point type
+        to handle SFINAE correctly.  Attempting conversion while passing
+        `complain` down is infeasible since ocp_convert eventually calls
+        `convert_to_real_maybe_fold` which has no `complain` parameter (and is
+        indeed defined in general code).  We could expose some interface into
+        the general template overloaded machinery into this C/C++ common code
+        for this one conversion kind, but for the moment it seems more
+        feasible to perform the checks here instead.  Similar is done in
+        atomic_alttyped_fetch_using_cas_loop for conversions there.  */
+      if (SCALAR_FLOAT_TYPE_P (arg_type)
+         && POINTER_TYPE_P (TREE_TYPE ((*params)[parmnum])))
+       {
+         if (complain)
+           error_at (loc,
+                     "pointer value used where a floating-point was expected");
+         (*params)[parmnum] = error_mark_node;
+       }
+      else if ((TREE_CODE (arg_type) == INTEGER_TYPE
+               && TYPE_UNSIGNED (arg_type))
+              || SCALAR_FLOAT_TYPE_P (arg_type))
        {
          /* Ideally for the first conversion we'd use convert_for_assignment
-            so that we get warnings for anything that doesn't match the pointer
-            type.  This isn't portable across the C and C++ front ends atm.  */
+            so that we get warnings for anything that doesn't match the
+            pointer type.  This isn't portable across the C and C++ front
+            ends atm.  */
          val = (*params)[parmnum];
          val = convert (ptype, val);
          val = convert (arg_type, val);
@@ -8218,10 +8238,11 @@ resolve_overloaded_atomic_store (location_t loc, tree 
function,
    ORIG_PARAMS arguments of the call.  */
 
 static tree
-atomic_bitint_fetch_using_cas_loop (location_t loc,
-                                   enum built_in_function orig_code,
-                                   tree orig_function,
-                                   vec<tree, va_gc> *orig_params)
+atomic_alttyped_fetch_using_cas_loop (location_t loc,
+                                     enum built_in_function orig_code,
+                                     tree orig_function,
+                                     vec<tree, va_gc> *orig_params,
+                                     bool complain)
 {
   enum tree_code code = ERROR_MARK;
   bool return_old_p = false;
@@ -8273,21 +8294,51 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
 
   if (orig_params->length () != 3)
     {
-      if (orig_params->length () < 3)
-       error_at (loc, "too few arguments to function %qE", orig_function);
-      else
-       error_at (loc, "too many arguments to function %qE", orig_function);
+      if (complain)
+       {
+         if (orig_params->length () < 3)
+           error_at (loc, "too few arguments to function %qE", orig_function);
+         else
+           error_at (loc, "too many arguments to function %qE", orig_function);
+       }
       return error_mark_node;
     }
 
-  tree stmts = push_stmt_list ();
-
   tree nonatomic_lhs_type = TREE_TYPE (TREE_TYPE ((*orig_params)[0]));
   nonatomic_lhs_type = TYPE_MAIN_VARIANT (nonatomic_lhs_type);
-  gcc_assert (TREE_CODE (nonatomic_lhs_type) == BITINT_TYPE);
+  gcc_assert (TREE_CODE (nonatomic_lhs_type) == BITINT_TYPE
+             || SCALAR_FLOAT_TYPE_P (nonatomic_lhs_type));
 
   tree lhs_addr = (*orig_params)[0];
-  tree val = convert (nonatomic_lhs_type, (*orig_params)[1]);
+  tree val = NULL_TREE;
+  if (SCALAR_FLOAT_TYPE_P (nonatomic_lhs_type))
+    {
+      /* Casting from pointer to floating point type is somewhat difficult to
+        maintain SFINAE context.  Reason explained in more detail in
+        sync_resolve_params (which has the same need).  */
+      if (c_dialect_cxx () && POINTER_TYPE_P (TREE_TYPE ((*orig_params)[1])))
+       {
+         if (complain)
+           error_at (loc,
+                     "pointer value used where a floating-point was expected");
+         return error_mark_node;
+       }
+      /* Floating point typed expressions may be EXCESS_PRECISION_EXPR's.
+        Wrapping that in a convert before passing it to c_fully_fold would
+        lead to an ICE in c_fully_fold_internal.
+        Since this is semantically at the boundary of a function call we do
+        not want to carry forward any excess precision arithmetic.  Instead we
+        want to "collapse" to the semantic value at this boundary, then
+        perform operations on that.  `c_fully_fold` performs the folding
+        conversion to the semantic type, but only does so at the outermost
+        level in the expression.  Hence remove one layer of nesting in the
+        casts. */
+      else if (!c_dialect_cxx ()
+              && TREE_CODE ((*orig_params)[1]) == EXCESS_PRECISION_EXPR)
+       val = convert (nonatomic_lhs_type, TREE_OPERAND ((*orig_params)[1], 0));
+    }
+  if (!val)
+    val = convert (nonatomic_lhs_type, (*orig_params)[1]);
   tree model = convert (integer_type_node, (*orig_params)[2]);
   if (!c_dialect_cxx ())
     {
@@ -8295,6 +8346,8 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
       val = c_fully_fold (val, false, NULL);
       model = c_fully_fold (model, false, NULL);
     }
+
+  tree stmts = push_stmt_list ();
   if (TREE_SIDE_EFFECTS (lhs_addr))
     {
       tree var = create_tmp_var_raw (TREE_TYPE (lhs_addr));
@@ -8341,7 +8394,7 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
   params->quick_push (lhs_addr);
   params->quick_push (old_addr);
   params->quick_push (build_int_cst (integer_type_node, MEMMODEL_RELAXED));
-  tree func_call = resolve_overloaded_builtin (loc, fndecl, params);
+  tree func_call = resolve_overloaded_builtin (loc, fndecl, params, complain);
   if (func_call == NULL_TREE)
     func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL);
   old = build4 (TARGET_EXPR, nonatomic_lhs_type, old, func_call, NULL_TREE,
@@ -8349,6 +8402,18 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
   add_stmt (old);
   params->truncate (0);
 
+  /* feholdexcept (&fenv)
+     N.b. Similar to build_atomic_assign in the C frontend (see comment above
+     build_atomic_assign for explanation on need).
+     */
+  bool need_fenv
+    = (flag_trapping_math && SCALAR_FLOAT_TYPE_P (nonatomic_lhs_type));
+  tree hold_call = NULL_TREE, clear_call = NULL_TREE, update_call = NULL_TREE;
+  if (need_fenv)
+    targetm.atomic_assign_expand_fenv (&hold_call, &clear_call, &update_call);
+  if (hold_call)
+    add_stmt (hold_call);
+
   /* loop:  */
   add_stmt (loop_label);
 
@@ -8358,7 +8423,8 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
     {
     case PLUS_EXPR:
     case MINUS_EXPR:
-      if (!TYPE_OVERFLOW_WRAPS (nonatomic_lhs_type))
+      if (TREE_CODE (nonatomic_lhs_type) == BITINT_TYPE
+         && !TYPE_OVERFLOW_WRAPS (nonatomic_lhs_type))
        {
          tree utype
            = build_bitint_type (TYPE_PRECISION (nonatomic_lhs_type), 1);
@@ -8368,7 +8434,17 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
                                     convert (utype, val)));
        }
       else
-       rhs = build2_loc (loc, code, nonatomic_lhs_type, old, val);
+       {
+         /* Floating point types like bfloat16 are sometimes storage only and
+            hence an addition is not allowed.  That would be picked up by the
+            hook below.  Such floating point types *also* don't get put into
+            the global_trees, which means that e.g. bfloat16_type_node is
+            NULL.  That means they get caught in sync_resolve_size instead of
+            here.  Add an assertion that this understanding is correct.  */
+         gcc_assert (!targetm.invalid_binary_op (code, nonatomic_lhs_type,
+                                                 nonatomic_lhs_type));
+         rhs = build2_loc (loc, code, nonatomic_lhs_type, old, val);
+       }
       break;
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
@@ -8388,6 +8464,18 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
   SET_EXPR_LOCATION (rhs, loc);
   add_stmt (rhs);
 
+  /* __builtin_clear_padding (&new);  */
+  if (SCALAR_FLOAT_TYPE_P (nonatomic_lhs_type))
+    {
+      fndecl = builtin_decl_explicit (BUILT_IN_CLEAR_PADDING);
+      params->quick_push (newval_addr);
+      func_call = resolve_overloaded_builtin (loc, fndecl, params, complain);
+      if (func_call == NULL_TREE)
+       func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL);
+      add_stmt (func_call);
+      params->truncate (0);
+    }
+
   /* if (__atomic_compare_exchange (addr, &old, &new, false, model, model))
        goto done;  */
   fndecl = builtin_decl_explicit (BUILT_IN_ATOMIC_COMPARE_EXCHANGE);
@@ -8402,7 +8490,7 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
     params->quick_push (build_int_cst (integer_type_node, MEMMODEL_RELAXED));
   else
     params->quick_push (model);
-  func_call = resolve_overloaded_builtin (loc, fndecl, params);
+  func_call = resolve_overloaded_builtin (loc, fndecl, params, complain);
   if (func_call == NULL_TREE)
     func_call = build_function_call_vec (loc, vNULL, fndecl, params, NULL);
 
@@ -8414,6 +8502,10 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
   SET_EXPR_LOCATION (stmt, loc);
   add_stmt (stmt);
 
+  /* feclearexcept (FE_ALL_EXCEPT); */
+  if (clear_call)
+    add_stmt (clear_call);
+
   /* goto loop;  */
   goto_stmt = build1 (GOTO_EXPR, void_type_node, loop_decl);
   SET_EXPR_LOCATION (goto_stmt, loc);
@@ -8422,6 +8514,10 @@ atomic_bitint_fetch_using_cas_loop (location_t loc,
   /* done:  */
   add_stmt (done_label);
 
+  /* feupdateenv (&fenv);  */
+  if (update_call)
+    add_stmt (update_call);
+
   tree ret = create_tmp_var_raw (nonatomic_lhs_type);
   stmt = build2_loc (loc, MODIFY_EXPR, void_type_node, ret,
                     return_old_p ? old : newval);
@@ -8644,9 +8740,37 @@ resolve_overloaded_builtin (location_t loc, tree 
function,
        if (n == 0)
          return error_mark_node;
 
-       if (fp_specialisation_offset != 0)
-         fncode = (enum built_in_function) ((int) orig_code
-                                            + fp_specialisation_offset);
+       if (fp_specialisation_offset != 0
+           && (fncode = (enum built_in_function) ((int) orig_code
+                                                  + fp_specialisation_offset))
+           && (!flag_inline_atomics
+               || atomic_fp_fetch_add_implemented (
+                 TYPE_MODE (TREE_TYPE (TREE_TYPE ((*params)[0]))))))
+         {
+           /* If this is a floating point operation then we determine `fncode`
+              in the if condition above.  That condition also checks the
+              target supports that operation directly, and if not we expand
+              into a CAS loop.
+              N.b. we check in the frontend because if expanding into a CAS
+              loop on floating point values we need to handle `fenv` floating
+              point exception information (similar to build_atomic_assign in
+              the C frontend, where there is more information on this in a
+              comment).
+              The target hook for generating the IR to represent these
+              needed operations returns GENERIC, which is much easier to deal
+              with earlier on.
+              The only difficulty is that the natural way to identify whether
+              a backend can handle this function is to check the optabs and
+              we can't check optabs directly in the frontend.  Hence have to
+              have a wrapper function to call.
+              If this is the case then we don't need to do anything (all we
+              need to do is calculate the `fncode`, but that's done in the
+              condition).
+
+              If the user passed `-fno-inline-atomics` then we want to emit
+              the relevant fetch_op function with floating point types since
+              that seems like the most intuitive user interface.  */
+         }
        else if (n == -1)
          {
            /* complain is related to SFINAE context.
@@ -8657,9 +8781,9 @@ resolve_overloaded_builtin (location_t loc, tree function,
               wrong).
               Since can't test avoiding an error when this value is false not
               writing the code and instead asserting value is not set.  */
-           gcc_assert (complain);
-           return atomic_bitint_fetch_using_cas_loop (loc, orig_code, function,
-                                                      params);
+           return atomic_alttyped_fetch_using_cas_loop (loc, orig_code,
+                                                        function, params,
+                                                        complain);
          }
        else
          fncode
@@ -8669,6 +8793,7 @@ resolve_overloaded_builtin (location_t loc, tree function,
        if (!sync_resolve_params (loc, function, new_function, params,
                                  orig_format, complain))
          return error_mark_node;
+
        first_param = (*params)[0];
        result = build_function_call_vec (loc, vNULL, new_function, params,
                                          NULL);
diff --git a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads.def 
b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads.def
index 8df11f57420..bf7f2117f96 100644
--- a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads.def
+++ b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads.def
@@ -118,9 +118,9 @@ class Incomplete;
 
 ATOMIC_SFINAES
 
-#define FETCH_OP_ASSERTS(NAME) \
+#define FETCH_OP_ASSERTS(NAME, HANDLES_FLOAT) \
   MAKE_ATOMIC_ASSERT(NAME, int *, true) \
-  MAKE_ATOMIC_ASSERT(NAME, float, false) \
+  MAKE_ATOMIC_ASSERT(NAME, float, HANDLES_FLOAT) \
   MAKE_ATOMIC_ASSERT(NAME, int, true) \
   MAKE_ATOMIC_ASSERT(NAME, bool,  false) \
   MAKE_ATOMIC_ASSERT(NAME, X,  false) \
@@ -130,18 +130,18 @@ ATOMIC_SFINAES
   MAKE_ATOMIC_ASSERT(NAME, long, true)
 
 #define ATOMIC_FETCH_ASSERTS \
-  FETCH_OP_ASSERTS(add_fetch) \
-  FETCH_OP_ASSERTS(fetch_add) \
-  FETCH_OP_ASSERTS(sub_fetch) \
-  FETCH_OP_ASSERTS(fetch_sub) \
-  FETCH_OP_ASSERTS(and_fetch) \
-  FETCH_OP_ASSERTS(fetch_and) \
-  FETCH_OP_ASSERTS(xor_fetch) \
-  FETCH_OP_ASSERTS(fetch_xor) \
-  FETCH_OP_ASSERTS(or_fetch) \
-  FETCH_OP_ASSERTS(fetch_or) \
-  FETCH_OP_ASSERTS(nand_fetch) \
-  FETCH_OP_ASSERTS(fetch_nand)
+  FETCH_OP_ASSERTS(add_fetch, true) \
+  FETCH_OP_ASSERTS(fetch_add, true) \
+  FETCH_OP_ASSERTS(sub_fetch, true) \
+  FETCH_OP_ASSERTS(fetch_sub, true) \
+  FETCH_OP_ASSERTS(and_fetch, false) \
+  FETCH_OP_ASSERTS(fetch_and, false) \
+  FETCH_OP_ASSERTS(xor_fetch, false) \
+  FETCH_OP_ASSERTS(fetch_xor, false) \
+  FETCH_OP_ASSERTS(or_fetch, false) \
+  FETCH_OP_ASSERTS(fetch_or, false) \
+  FETCH_OP_ASSERTS(nand_fetch, false) \
+  FETCH_OP_ASSERTS(fetch_nand, false)
 
 #define ATOMIC_GENERIC_ASSERTS(NAME) \
   MAKE_ATOMIC_ASSERT(NAME##_n, int *, true) \
diff --git a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads6.C 
b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads6.C
index 6ecf318b8c3..dc7966f76cb 100644
--- a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads6.C
+++ b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads6.C
@@ -132,11 +132,23 @@ typedef __UINT32_TYPE__ uint32_t;
 */
 #define BITINT_FETCHCAS_ERRS(X)
 
+#define FLOAT_FETCHCAS_TOOFEW(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<float>()), 0)
+#define FLOAT_FETCHADD_WRONGTYPE(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<int *>(), 0), 1)
+#define FLOAT_FETCHCAS_TOOMANY(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<float>(), int(), int()), 
2)
+#define FLOAT_FETCHCAS_ERRS(X) \
+  FLOAT_FETCHCAS_TOOFEW(X) \
+  FLOAT_FETCHADD_WRONGTYPE(X) \
+  FLOAT_FETCHCAS_TOOMANY(X)
+
 #define ALL_ERRS(X) \
   GET_ATOMIC_GENERIC_ERRS(X) \
   SYNC_SIZE_ERRS(X) \
   SYNC_PARM_ERRS(X) \
-  BITINT_FETCHCAS_ERRS(X)
+  BITINT_FETCHCAS_ERRS(X) \
+  FLOAT_FETCHCAS_ERRS(X)
 
 #define SFINAE_TYPE_CHECK(NAME, PARAMS, COUNTER) \
   template <typename T, typename = void> \
@@ -144,7 +156,7 @@ typedef __UINT32_TYPE__ uint32_t;
   template <typename T> \
   struct is_##NAME##_available_##COUNTER<T, \
     std::void_t<decltype(__atomic_##NAME PARAMS) >> \
-    : std::true_type {}; \
+    : std::true_type {};
 
 ALL_ERRS(SFINAE_TYPE_CHECK)
 
@@ -152,8 +164,11 @@ ALL_ERRS(SFINAE_TYPE_CHECK)
 /* { dg-error "operand type 'int' is incompatible with argument 1 of 
'__atomic_load_n'"  "" { target *-*-* } 110 } */
 /* { dg-error "too few arguments to function '__atomic_load_n'"                
          "" { target *-*-* } 116 } */
 /* { dg-error "too many arguments to function '__atomic_load_n'"               
          "" { target *-*-* } 118 } */
-/* { dg-error "template argument 1 is invalid"                                 
          "" { target *-*-* } 146 } */
-/* { dg-error "template argument 2 is invalid"                                 
          "" { target *-*-* } 146 } */
+/* { dg-error "too few arguments to function '__atomic_add_fetch'"             
          "" { target *-*-* } 136 } */
+/* { dg-error "pointer value used where a floating-point was expected"         
          "" { target *-*-* } 138 } */
+/* { dg-error "too many arguments to function '__atomic_add_fetch'"            
          "" { target *-*-* } 140 } */
+/* { dg-error "template argument 1 is invalid"                                 
          "" { target *-*-* } 158 } */
+/* { dg-error "template argument 2 is invalid"                                 
          "" { target *-*-* } 158 } */
 /* { dg-error "incorrect number of arguments to function '__atomic_load'"      
          "" { target *-*-* } 48 } */
 /* { dg-error "argument 1 of '__atomic_load' must be a non-void pointer type"  
          "" { target *-*-* } 51 } */
 /* { dg-error "argument 1 of '__atomic_load' must be a non-void pointer type"  
          "" { target *-*-* } 53 } */
diff --git a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads7.C 
b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads7.C
index ef1d4627758..c88277c2725 100644
--- a/gcc/testsuite/g++.dg/template/builtin-atomic-overloads7.C
+++ b/gcc/testsuite/g++.dg/template/builtin-atomic-overloads7.C
@@ -139,11 +139,23 @@ typedef __UINT32_TYPE__ uint32_t;
 */
 #define BITINT_FETCHCAS_ERRS(X)
 
+#define FLOAT_FETCHCAS_TOOFEW(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<T>()), 0)
+#define FLOAT_FETCHADD_WRONGTYPE(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<T>(), int()), 1)
+#define FLOAT_FETCHCAS_TOOMANY(X) \
+  X(add_fetch, (std::declval<float*>(), std::declval<T>(), int(), int()), 2)
+#define FLOAT_FETCHCAS_ERRS(X) \
+  FLOAT_FETCHCAS_TOOFEW(X) \
+  FLOAT_FETCHADD_WRONGTYPE(X) \
+  FLOAT_FETCHCAS_TOOMANY(X)
+
 #define ALL_ERRS(X) \
   GET_ATOMIC_GENERIC_ERRS(X) \
   SYNC_SIZE_ERRS(X) \
   SYNC_PARM_ERRS(X) \
-  BITINT_FETCHCAS_ERRS(X)
+  BITINT_FETCHCAS_ERRS(X) \
+  FLOAT_FETCHCAS_ERRS(X)
 
 #define SFINAE_TYPE_CHECK(NAME, PARAMS, COUNTER) \
   template <typename T, typename = void> \
@@ -151,7 +163,7 @@ typedef __UINT32_TYPE__ uint32_t;
   template <typename T> \
   struct is_##NAME##_available_##COUNTER<T, \
     std::void_t<decltype(__atomic_##NAME PARAMS) >> \
-    : std::true_type {}; \
+    : std::true_type {};
 
 ALL_ERRS(SFINAE_TYPE_CHECK)
 MEMMODEL_TOOLARGE(SFINAE_TYPE_CHECK)
diff --git a/gcc/testsuite/gcc.dg/atomic-op-fp-convert.c 
b/gcc/testsuite/gcc.dg/atomic-op-fp-convert.c
new file mode 100644
index 00000000000..7fe58dab334
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic-op-fp-convert.c
@@ -0,0 +1,6 @@
+/* Check error message on converting from pointer to floating point type.   */
+/* { dg-do compile } */
+float foo(float *x, int *y) {
+    return __atomic_add_fetch(x, y, 0); /* { dg-error "pointer value used 
where a floating-point was expected" } */
+}
+
diff --git a/gcc/testsuite/gcc.dg/atomic-op-fp-resolve-complain.c 
b/gcc/testsuite/gcc.dg/atomic-op-fp-resolve-complain.c
new file mode 100644
index 00000000000..a891c4fc7f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic-op-fp-resolve-complain.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+extern float x;
+void doload(float addval) {
+    __atomic_fetch_add (&x, addval); /* { dg-error "too few arguments to 
function '__atomic_fetch_add'" }  */
+}
diff --git a/gcc/testsuite/gcc.target/i386/excess-precision-13.c 
b/gcc/testsuite/gcc.target/i386/excess-precision-13.c
new file mode 100644
index 00000000000..d9444c1ead9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/excess-precision-13.c
@@ -0,0 +1,88 @@
+/* Excess precision tests.  Ensure that builtin gives same result as
+   hand-written functions.  */
+/* { dg-do run } */
+/* { dg-options "-std=c99 -mfpmath=387 -fexcess-precision=standard" } */
+/* Can use fallback if libatomic is available, otherwise need hardware 
support.  */
+/* { dg-require-effective-target sync_float_runtime { target { ! 
libatomic_available } } } */
+/* { dg-additional-options "-latomic" { target libatomic_available } } */
+/* { dg-additional-options "-fno-atomic-fp-fetchop-exceptions" { target { ! 
libatomic_available } } } */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+extern void abort (void);
+extern void exit (int);
+#ifdef __cplusplus
+}
+#endif
+
+float
+imitation_fetch_add_fpf (float *vp, float v, int)
+{
+  float retval = *vp;
+  *vp += v;
+  return retval;
+}
+
+float
+imitation_add_fetch_fpf (float *vp, float v, int)
+{
+  *vp += v;
+  return *vp;
+}
+
+float
+imitation_fetch_sub_fpf (float *vp, float v, int)
+{
+  float retval = *vp;
+  *vp -= v;
+  return retval;
+}
+
+float
+imitation_sub_fetch_fpf (float *vp, float v, int)
+{
+  *vp -= v;
+  return *vp;
+}
+
+
+int
+main (void)
+{
+  float f = 1.0f;
+  float ret, altret;
+  int i;
+  int alt;
+
+  i = 0x10001234;
+  if ((float) i != 0x10001240)
+    abort ();
+
+  i = 0x10001234;
+  alt = i;
+  ret = __atomic_fetch_add ((float*)&alt, f, 0);
+  if (ret != imitation_fetch_add_fpf ((float*)&i, f, 0))
+    abort ();
+
+  i = 0x10001234;
+  alt = i;
+  ret = __atomic_add_fetch ((float*)&alt, 1.0f, 0);
+  if (ret != imitation_add_fetch_fpf ((float*)&i, 1.0f, 0))
+    abort ();
+
+  i = 0x10001234;
+  alt = i;
+  ret = __atomic_add_fetch ((float*)&alt, 1.0, 0);
+  if (ret != imitation_add_fetch_fpf ((float*)&i, 1.0, 0))
+    abort ();
+
+  i = 0x10001234;
+  alt = i;
+  altret = ret;
+  ret = __atomic_add_fetch ((float*)&alt, (1.0 * ret), 0);
+  if (ret != imitation_add_fetch_fpf ((float*)&i, (1.0 * altret), 0))
+    abort ();
+
+  exit (0);
+}
-- 
2.43.0

[PATCH 05/11] c: c++: Expand into CAS loop in frontend

Reply via email to