From: Soumya AR <[email protected]>

This patch implements the expand_ifn_atomic_fetch_minmax function to expand the
IFN_ATOMIC_FETCH_MINMAX internal function to target specific optabs, if
available.

If the target does not support the optab, continue lowering to a CAS loop.

----

When handling sub-word atomic calls, signed types need special handling.

Consider the following gimple:

  short int atomic_var;
  short int _1;
  atomic_var = -5;
  _1 = .ATOMIC_FETCH_MINMAX (&atomic_var, -10, 5, 0, 1, 1);

When expanding lhs (_1), the RTL is generated as a SUBREG with "/s/u" flags set,
which imply that the value is a sign-extended promoted value. An atomic call
returning a non-extended value to this target will violate the contract.

Just emitting a sign extension after the atomic call is not enough since the
compiler will incorrectly assume that the value is already sign-extended, on the
basis of the previously set flags.

To maintain this contract and ensure the sign extension is not optimized away,
we use a temporary non-promoted register for the atomic operation, explicitly
perform the extension, then move to the original promoted target.

----

expand_binop() is extended to handle min/max operations, either via a
conditional move or a compare and branch sequence.

To do this, we migrate the code from expand_expr_real_2() to expand_binop()
for min/max operations.

Earlier, the MIN/MAX case in expand_expr_real_2() was handled by calling
expand_binop() with OPTAB_WIDEN. Since expand_binop() did not support min/max,
it would return NULL if no direct optab was available. expand_expr_real_2()
would then proceed to a compare and branch / conditional move sequence.

Since we extend expand_binop() to do this now, we can remove the fallback from
expand_expr_real_2, as we would always return a non-NULL expansion from
expand_binop() for min/max operations.

Would appreciate confirmation if this approach is OK.

----

Bootstrapped and regression tested on aarch64-linux-gnu and x86_64-linux-gnu.
Cross-compiled and regression tested for arm-linux-gnueabihf-armv7-a and
aarch64-linux-gnu without LSE.

Signed-off-by: Soumya AR <[email protected]>

gcc/ChangeLog:

        * builtins.cc (expand_ifn_atomic_fetch_minmax): New function to
        expand IFN_ATOMIC_FETCH_MINMAX.
        (expand_builtin_atomic_fetch_op): Add SMIN/SMAX result handling.
        * builtins.h (expand_ifn_atomic_fetch_minmax): New declaration.
        * expr.cc (expand_expr_real_2): Remove MIN_EXPR/MAX_EXPR handling,
        now in expand_binop.
        * ifn-atomic-minmax-lowering.cc (can_expand_atomic_minmax): New
        function to check optab availability.
        (lower_ifn_atomic_minmax): Only lower to CAS if optab unavailable.
        * internal-fn.cc (expand_ATOMIC_FETCH_MINMAX): Implement expansion
        via expand_ifn_atomic_fetch_minmax.
        * optabs.cc (expand_binop): Add MIN/MAX expansion support.
        (expand_atomic_fetch_op_no_fallback): Handle SMIN/SMAX operations.
        (expand_atomic_fetch_op): Handle SMIN/SMAX operations.

---
 gcc/builtins.cc                   | 83 +++++++++++++++++++++++++-
 gcc/builtins.h                    |  1 +
 gcc/expr.cc                       | 88 +---------------------------
 gcc/ifn-atomic-minmax-lowering.cc | 26 +++++++++
 gcc/internal-fn.cc                | 12 +---
 gcc/optabs.cc                     | 97 +++++++++++++++++++++++++++++++
 6 files changed, 210 insertions(+), 97 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index a16a25c8be9..ff17548c54e 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6770,6 +6770,85 @@ expand_builtin_atomic_compare_exchange (machine_mode 
mode, tree exp,
   return target;
 }
 
+/* Expand IFN_ATOMIC_FETCH_MINMAX internal function.  */
+
+void
+expand_ifn_atomic_fetch_minmax (gcall *call)
+{
+  tree ptr = gimple_call_arg (call, 0);
+  tree value = gimple_call_arg (call, 1);
+  tree memorder_tree = gimple_call_arg (call, 2);
+  tree is_min = gimple_call_arg (call, 3);
+  tree is_fetch_op = gimple_call_arg (call, 4);
+  tree is_signed = gimple_call_arg (call, 5);
+
+  bool is_min_bool = tree_to_shwi (is_min);
+  bool is_fetch_op_bool = tree_to_shwi (is_fetch_op);
+  bool is_signed_bool = tree_to_shwi (is_signed);
+
+  memmodel model = get_memmodel (memorder_tree);
+  tree lhs = gimple_call_lhs (call);
+
+  enum rtx_code code;
+  if (is_signed_bool)
+    code = is_min_bool ? SMIN : SMAX;
+  else
+    code = is_min_bool ? UMIN : UMAX;
+
+  machine_mode mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (ptr)));
+
+  rtx mem = get_builtin_sync_mem (ptr, mode);
+
+  rtx val = expand_expr (value, NULL_RTX, TYPE_MODE (TREE_TYPE (value)),
+                        EXPAND_NORMAL);
+
+  rtx target;
+  if (lhs)
+    target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  else
+    target = gen_reg_rtx (mode);
+
+  /* For sub-word signed operations, we need to handle sign extension
+     carefully. The issue is that atomic libcalls may not return properly
+     sign-extended values, but the LHS might be a promoted variable that
+     has been marked as a sign-extended register. A forced sign extension
+     after the atomic call might get optimized away if the target has been
+     marked as sign-extended.  */
+  rtx temp_target = target;
+  if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target)
+      && GET_MODE_SIZE (mode).to_constant () < UNITS_PER_WORD)
+    {
+      /* Use a temporary non-promoted register for the atomic operation
+        to avoid incorrect flags for the target register.  */
+      temp_target = gen_reg_rtx (mode);
+    }
+
+  rtx result = expand_atomic_fetch_op (temp_target, mem, val, code, model,
+                                      !is_fetch_op_bool);
+
+  if (!result)
+    gcc_unreachable ();
+
+  /* If we used a temporary target due to promotion, handle the conversion
+   * properly. */
+  if (temp_target != target)
+    {
+      /* For signed types, explicitly sign-extend the result.  */
+      if (is_signed_bool)
+       {
+         rtx extended = gen_reg_rtx (word_mode);
+         emit_insn (
+           gen_rtx_SET (extended, gen_rtx_SIGN_EXTEND (word_mode, result)));
+         result = gen_lowpart (mode, extended);
+       }
+
+      /* Now store to the actual target.  */
+      convert_move (SUBREG_REG (target), result, !is_signed_bool);
+    }
+  else if (result != target)
+    emit_move_insn (target, result);
+}
+
 /* Helper function for expand_ifn_atomic_compare_exchange - expand
    internal ATOMIC_COMPARE_EXCHANGE call into __atomic_compare_exchange_N
    call.  The weak parameter must be dropped to match the expected parameter
@@ -6989,6 +7068,9 @@ expand_builtin_atomic_fetch_op (machine_mode mode, tree 
exp, rtx target,
                                     OPTAB_LIB_WIDEN);
          ret = expand_simple_unop (mode, NOT, ret, target, true);
        }
+      else if (code == SMIN || code == SMAX)
+       ret = expand_simple_binop (mode, code, ret, val, target, false,
+                                  OPTAB_LIB_WIDEN);
       else
        ret = expand_simple_binop (mode, code, ret, val, target, true,
                                   OPTAB_LIB_WIDEN);
@@ -7764,7 +7846,6 @@ expand_speculation_safe_value (machine_mode mode, tree 
exp, rtx target,
       mode = TYPE_MODE (TREE_TYPE (arg0));
       gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
     }
-
   val = expand_expr (arg0, NULL_RTX, mode, EXPAND_NORMAL);
 
   /* An optional second argument can be used as a failsafe value on
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 1a56b5ed2ba..16641770144 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -133,6 +133,7 @@ extern void expand_builtin_trap (void);
 extern void expand_ifn_atomic_bit_test_and (gcall *);
 extern void expand_ifn_atomic_compare_exchange (gcall *);
 extern void expand_ifn_atomic_op_fetch_cmp_0 (gcall *);
+extern void expand_ifn_atomic_fetch_minmax (gcall *);
 extern rtx expand_builtin_crc_table_based (internal_fn, scalar_mode,
                                           scalar_mode, machine_mode,
                                           tree, rtx);
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 70b4eda6df3..0226dadcd6b 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -9831,7 +9831,6 @@ expand_expr_real_2 (const_sepops ops, rtx target, 
machine_mode tmode,
                    enum expand_modifier modifier)
 {
   rtx op0, op1, op2, temp;
-  rtx_code_label *lab;
   tree type;
   int unsignedp;
   machine_mode mode;
@@ -10634,92 +10633,7 @@ expand_expr_real_2 (const_sepops ops, rtx target, 
machine_mode tmode,
       if (temp != 0)
        return temp;
 
-      if (VECTOR_TYPE_P (type))
-       gcc_unreachable ();
-
-      /* At this point, a MEM target is no longer useful; we will get better
-        code without it.  */
-
-      if (! REG_P (target))
-       target = gen_reg_rtx (mode);
-
-      /* If op1 was placed in target, swap op0 and op1.  */
-      if (target != op0 && target == op1)
-       std::swap (op0, op1);
-
-      /* We generate better code and avoid problems with op1 mentioning
-        target by forcing op1 into a pseudo if it isn't a constant.  */
-      if (! CONSTANT_P (op1))
-       op1 = force_reg (mode, op1);
-
-      {
-       enum rtx_code comparison_code;
-       rtx cmpop1 = op1;
-
-       if (code == MAX_EXPR)
-         comparison_code = unsignedp ? GEU : GE;
-       else
-         comparison_code = unsignedp ? LEU : LE;
-
-       /* Canonicalize to comparisons against 0.  */
-       if (op1 == const1_rtx)
-         {
-           /* Converting (a >= 1 ? a : 1) into (a > 0 ? a : 1)
-              or (a != 0 ? a : 1) for unsigned.
-              For MIN we are safe converting (a <= 1 ? a : 1)
-              into (a <= 0 ? a : 1)  */
-           cmpop1 = const0_rtx;
-           if (code == MAX_EXPR)
-             comparison_code = unsignedp ? NE : GT;
-         }
-       if (op1 == constm1_rtx && !unsignedp)
-         {
-           /* Converting (a >= -1 ? a : -1) into (a >= 0 ? a : -1)
-              and (a <= -1 ? a : -1) into (a < 0 ? a : -1) */
-           cmpop1 = const0_rtx;
-           if (code == MIN_EXPR)
-             comparison_code = LT;
-         }
-
-       /* Use a conditional move if possible.  */
-       if (can_conditionally_move_p (mode))
-         {
-           rtx insn;
-
-           start_sequence ();
-
-           /* Try to emit the conditional move.  */
-           insn = emit_conditional_move (target,
-                                         { comparison_code,
-                                           op0, cmpop1, mode },
-                                         op0, op1, mode,
-                                         unsignedp);
-
-           /* If we could do the conditional move, emit the sequence,
-              and return.  */
-           if (insn)
-             {
-               rtx_insn *seq = end_sequence ();
-               emit_insn (seq);
-               return target;
-             }
-
-           /* Otherwise discard the sequence and fall back to code with
-              branches.  */
-           end_sequence ();
-         }
-
-       if (target != op0)
-         emit_move_insn (target, op0);
-
-       lab = gen_label_rtx ();
-       do_compare_rtx_and_jump (target, cmpop1, comparison_code,
-                                unsignedp, mode, NULL_RTX, NULL, lab,
-                                profile_probability::uninitialized ());
-      }
-      emit_move_insn (target, op1);
-      emit_label (lab);
-      return target;
+      gcc_unreachable ();
 
     case BIT_NOT_EXPR:
       op0 = expand_expr (treeop0, subtarget,
diff --git a/gcc/ifn-atomic-minmax-lowering.cc 
b/gcc/ifn-atomic-minmax-lowering.cc
index 8f58bf235d6..5a4811e662d 100644
--- a/gcc/ifn-atomic-minmax-lowering.cc
+++ b/gcc/ifn-atomic-minmax-lowering.cc
@@ -49,6 +49,7 @@ This pass transforms the IFN to the following pattern:
 #include "coretypes.h"
 #include "backend.h"
 #include "target.h"
+#include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
 #include "gimple-iterator.h"
@@ -57,6 +58,7 @@ This pass transforms the IFN to the following pattern:
 #include "memmodel.h"
 #include "tm_p.h"
 #include "ssa.h"
+#include "optabs.h"
 #include "optabs-query.h"
 #include "tree-ssanames.h"
 #include "fold-const.h"
@@ -92,6 +94,30 @@ insert_type_conversion (gimple_stmt_iterator *gsi, tree 
src_value,
   return result;
 }
 
+/* Check if optab is available for the atomic min/max operation.  */
+
+static bool
+can_expand_atomic_minmax (gcall *call_stmt)
+{
+  tree datatype = TREE_TYPE (TREE_TYPE (gimple_call_arg (call_stmt, 0)));
+  bool is_min = tree_to_uhwi (gimple_call_arg (call_stmt, 3));
+  bool is_fetch_op = tree_to_uhwi (gimple_call_arg (call_stmt, 4));
+
+  machine_mode mode = TYPE_MODE (datatype);
+  bool is_signed = !TYPE_UNSIGNED (datatype);
+
+  optab op_optab
+    = is_signed
+       ? (is_fetch_op
+            ? (is_min ? atomic_fetch_smin_optab : atomic_fetch_smax_optab)
+            : (is_min ? atomic_smin_fetch_optab : atomic_smax_fetch_optab))
+       : (is_fetch_op
+            ? (is_min ? atomic_fetch_umin_optab : atomic_fetch_umax_optab)
+            : (is_min ? atomic_umin_fetch_optab : atomic_umax_fetch_optab));
+
+  return optab_handler (op_optab, mode) != CODE_FOR_nothing;
+}
+
 /* Lower a single IFN_ATOMIC_FETCH_MINMAX call to a CAS loop.  */
 
 static void
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 235c42ed413..ac9bf11df02 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3697,15 +3697,9 @@ expand_ATOMIC_XOR_FETCH_CMP_0 (internal_fn, gcall *call)
 /* Expand atomic fetch minmax.  */
 
 static void
-expand_ATOMIC_FETCH_MINMAX (internal_fn, gcall *stmt)
-{
-  /* This should only be reached if the GIMPLE pass determined that
-     an optab exists for this operation.  
-      
-     TODO: Implement this.    */
-  
-  /* Suppress unused parameter warnings for now */
-  (void) stmt;
+expand_ATOMIC_FETCH_MINMAX (internal_fn, gcall *call)
+{
+  expand_ifn_atomic_fetch_minmax (call);
 }
 
 /* Expand LAUNDER to assignment, lhs = arg0.  */
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ad556bf78fe..fe241659b82 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1649,6 +1649,93 @@ expand_binop (machine_mode mode, optab binoptab, rtx 
op0, rtx op1,
            }
        }
     }
+  /* Implement MIN/MAX expansion as a compare and branch/conditional move.  */
+  if ((binoptab == smin_optab || binoptab == smax_optab
+       || binoptab == umin_optab || binoptab == umax_optab)
+      && !VECTOR_MODE_P (mode) && methods != OPTAB_DIRECT)
+    {
+      bool is_max = (binoptab == smax_optab || binoptab == umax_optab);
+      rtx_code_label *lab;
+
+      if (target == NULL_RTX || !REG_P (target))
+       target = gen_reg_rtx (mode);
+
+      /* If op1 was placed in target, swap op0 and op1.  */
+      if (target != op0 && target == op1)
+       std::swap (op0, op1);
+
+      /* We generate better code and avoid problems with op1 mentioning
+        target by forcing op1 into a pseudo if it isn't a constant.  */
+      if (!CONSTANT_P (op1))
+       op1 = force_reg (mode, op1);
+
+      {
+       enum rtx_code comparison_code;
+       rtx cmpop1 = op1;
+
+       if (is_max)
+         comparison_code = unsignedp ? GEU : GE;
+       else
+         comparison_code = unsignedp ? LEU : LE;
+
+       /* Canonicalize to comparisons against 0.  */
+       if (op1 == const1_rtx)
+         {
+           /* Converting (a >= 1 ? a : 1) into (a > 0 ? a : 1)
+              or (a != 0 ? a : 1) for unsigned.
+              For MIN we are safe converting (a <= 1 ? a : 1)
+              into (a <= 0 ? a : 1)  */
+           cmpop1 = const0_rtx;
+           if (is_max)
+             comparison_code = unsignedp ? NE : GT;
+         }
+       if (op1 == constm1_rtx && !unsignedp)
+         {
+           /* Converting (a >= -1 ? a : -1) into (a >= 0 ? a : -1)
+              and (a <= -1 ? a : -1) into (a < 0 ? a : -1) */
+           cmpop1 = const0_rtx;
+           if (!is_max)
+             comparison_code = LT;
+         }
+
+       /* Use a conditional move if possible.  */
+       if (can_conditionally_move_p (mode))
+         {
+           rtx insn;
+
+           start_sequence ();
+
+           /* Try to emit the conditional move.  */
+           insn = emit_conditional_move (target,
+                                         {comparison_code, op0, cmpop1, mode},
+                                         op0, op1, mode, unsignedp);
+
+           /* If we could do the conditional move, emit the sequence,
+              and return.  */
+           if (insn)
+             {
+               rtx_insn *seq = end_sequence ();
+               emit_insn (seq);
+               return target;
+             }
+
+           /* Otherwise discard the sequence and fall back to code with
+              branches.  */
+           end_sequence ();
+         }
+
+       if (target != op0)
+         emit_move_insn (target, op0);
+
+       lab = gen_label_rtx ();
+       do_compare_rtx_and_jump (target, cmpop1, comparison_code, unsignedp,
+                                mode, NULL_RTX, NULL, lab,
+                                profile_probability::uninitialized ());
+      }
+      emit_move_insn (target, op1);
+      emit_label (lab);
+      return target;
+    }
 
   /* Look for a wider mode of the same class for which we think we
      can open-code the operation.  Check for a widening multiply at the
@@ -8020,6 +8107,11 @@ expand_atomic_fetch_op_no_fallback (rtx target, rtx mem, 
rtx val,
                                            true, OPTAB_LIB_WIDEN);
              result = expand_simple_unop (mode, NOT, result, target, true);
            }
+         else if (code == SMIN || code == SMAX)
+           {
+             result = expand_simple_binop (mode, code, result, val, target,
+                                           false, OPTAB_LIB_WIDEN);
+           }
          else
            result = expand_simple_binop (mode, code, result, val, target,
                                          true, OPTAB_LIB_WIDEN);
@@ -8146,6 +8238,11 @@ expand_atomic_fetch_op (rtx target, rtx mem, rtx val, 
enum rtx_code code,
                                    true, OPTAB_LIB_WIDEN);
          t1 = expand_simple_unop (mode, code, t1, NULL_RTX, true);
        }
+      else if (code == SMIN || code == SMAX)
+       {
+         t1 = expand_simple_binop (mode, code, t1, val, NULL_RTX,
+                                   false, OPTAB_LIB_WIDEN);
+       }
       else
        t1 = expand_simple_binop (mode, code, t1, val, NULL_RTX, true,
                                  OPTAB_LIB_WIDEN);
-- 
2.43.0

Reply via email to