From: Soumya AR <[email protected]> This patch implements the expand_ifn_atomic_fetch_minmax function to expand the IFN_ATOMIC_FETCH_MINMAX internal function to target specific optabs, if available.
If the target does not support the optab, continue lowering to a CAS loop. ---- When handling sub-word atomic calls, signed types need special handling. Consider the following gimple: short int atomic_var; short int _1; atomic_var = -5; _1 = .ATOMIC_FETCH_MINMAX (&atomic_var, -10, 5, 0, 1, 1); When expanding lhs (_1), the RTL is generated as a SUBREG with "/s/u" flags set, which imply that the value is a sign-extended promoted value. An atomic call returning a non-extended value to this target will violate the contract. Just emitting a sign extension after the atomic call is not enough since the compiler will incorrectly assume that the value is already sign-extended, on the basis of the previously set flags. To maintain this contract and ensure the sign extension is not optimized away, we use a temporary non-promoted register for the atomic operation, explicitly perform the extension, then move to the original promoted target. ---- expand_binop() is extended to handle min/max operations, either via a conditional move or a compare and branch sequence. To do this, we migrate the code from expand_expr_real_2() to expand_binop() for min/max operations. Earlier, the MIN/MAX case in expand_expr_real_2() was handled by calling expand_binop() with OPTAB_WIDEN. Since expand_binop() did not support min/max, it would return NULL if no direct optab was available. expand_expr_real_2() would then proceed to a compare and branch / conditional move sequence. Since we extend expand_binop() to do this now, we can remove the fallback from expand_expr_real_2, as we would always return a non-NULL expansion from expand_binop() for min/max operations. Would appreciate confirmation if this approach is OK. ---- Bootstrapped and regression tested on aarch64-linux-gnu and x86_64-linux-gnu. Cross-compiled and regression tested for arm-linux-gnueabihf-armv7-a and aarch64-linux-gnu without LSE. Signed-off-by: Soumya AR <[email protected]> gcc/ChangeLog: * builtins.cc (expand_ifn_atomic_fetch_minmax): New function to expand IFN_ATOMIC_FETCH_MINMAX. (expand_builtin_atomic_fetch_op): Add SMIN/SMAX result handling. * builtins.h (expand_ifn_atomic_fetch_minmax): New declaration. * expr.cc (expand_expr_real_2): Remove MIN_EXPR/MAX_EXPR handling, now in expand_binop. * ifn-atomic-minmax-lowering.cc (can_expand_atomic_minmax): New function to check optab availability. (lower_ifn_atomic_minmax): Only lower to CAS if optab unavailable. * internal-fn.cc (expand_ATOMIC_FETCH_MINMAX): Implement expansion via expand_ifn_atomic_fetch_minmax. * optabs.cc (expand_binop): Add MIN/MAX expansion support. (expand_atomic_fetch_op_no_fallback): Handle SMIN/SMAX operations. (expand_atomic_fetch_op): Handle SMIN/SMAX operations. --- gcc/builtins.cc | 83 +++++++++++++++++++++++++- gcc/builtins.h | 1 + gcc/expr.cc | 88 +--------------------------- gcc/ifn-atomic-minmax-lowering.cc | 26 +++++++++ gcc/internal-fn.cc | 12 +--- gcc/optabs.cc | 97 +++++++++++++++++++++++++++++++ 6 files changed, 210 insertions(+), 97 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index a16a25c8be9..ff17548c54e 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -6770,6 +6770,85 @@ expand_builtin_atomic_compare_exchange (machine_mode mode, tree exp, return target; } +/* Expand IFN_ATOMIC_FETCH_MINMAX internal function. */ + +void +expand_ifn_atomic_fetch_minmax (gcall *call) +{ + tree ptr = gimple_call_arg (call, 0); + tree value = gimple_call_arg (call, 1); + tree memorder_tree = gimple_call_arg (call, 2); + tree is_min = gimple_call_arg (call, 3); + tree is_fetch_op = gimple_call_arg (call, 4); + tree is_signed = gimple_call_arg (call, 5); + + bool is_min_bool = tree_to_shwi (is_min); + bool is_fetch_op_bool = tree_to_shwi (is_fetch_op); + bool is_signed_bool = tree_to_shwi (is_signed); + + memmodel model = get_memmodel (memorder_tree); + tree lhs = gimple_call_lhs (call); + + enum rtx_code code; + if (is_signed_bool) + code = is_min_bool ? SMIN : SMAX; + else + code = is_min_bool ? UMIN : UMAX; + + machine_mode mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (ptr))); + + rtx mem = get_builtin_sync_mem (ptr, mode); + + rtx val = expand_expr (value, NULL_RTX, TYPE_MODE (TREE_TYPE (value)), + EXPAND_NORMAL); + + rtx target; + if (lhs) + target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + else + target = gen_reg_rtx (mode); + + /* For sub-word signed operations, we need to handle sign extension + carefully. The issue is that atomic libcalls may not return properly + sign-extended values, but the LHS might be a promoted variable that + has been marked as a sign-extended register. A forced sign extension + after the atomic call might get optimized away if the target has been + marked as sign-extended. */ + rtx temp_target = target; + if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target) + && GET_MODE_SIZE (mode).to_constant () < UNITS_PER_WORD) + { + /* Use a temporary non-promoted register for the atomic operation + to avoid incorrect flags for the target register. */ + temp_target = gen_reg_rtx (mode); + } + + rtx result = expand_atomic_fetch_op (temp_target, mem, val, code, model, + !is_fetch_op_bool); + + if (!result) + gcc_unreachable (); + + /* If we used a temporary target due to promotion, handle the conversion + * properly. */ + if (temp_target != target) + { + /* For signed types, explicitly sign-extend the result. */ + if (is_signed_bool) + { + rtx extended = gen_reg_rtx (word_mode); + emit_insn ( + gen_rtx_SET (extended, gen_rtx_SIGN_EXTEND (word_mode, result))); + result = gen_lowpart (mode, extended); + } + + /* Now store to the actual target. */ + convert_move (SUBREG_REG (target), result, !is_signed_bool); + } + else if (result != target) + emit_move_insn (target, result); +} + /* Helper function for expand_ifn_atomic_compare_exchange - expand internal ATOMIC_COMPARE_EXCHANGE call into __atomic_compare_exchange_N call. The weak parameter must be dropped to match the expected parameter @@ -6989,6 +7068,9 @@ expand_builtin_atomic_fetch_op (machine_mode mode, tree exp, rtx target, OPTAB_LIB_WIDEN); ret = expand_simple_unop (mode, NOT, ret, target, true); } + else if (code == SMIN || code == SMAX) + ret = expand_simple_binop (mode, code, ret, val, target, false, + OPTAB_LIB_WIDEN); else ret = expand_simple_binop (mode, code, ret, val, target, true, OPTAB_LIB_WIDEN); @@ -7764,7 +7846,6 @@ expand_speculation_safe_value (machine_mode mode, tree exp, rtx target, mode = TYPE_MODE (TREE_TYPE (arg0)); gcc_assert (GET_MODE_CLASS (mode) == MODE_INT); } - val = expand_expr (arg0, NULL_RTX, mode, EXPAND_NORMAL); /* An optional second argument can be used as a failsafe value on diff --git a/gcc/builtins.h b/gcc/builtins.h index 1a56b5ed2ba..16641770144 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -133,6 +133,7 @@ extern void expand_builtin_trap (void); extern void expand_ifn_atomic_bit_test_and (gcall *); extern void expand_ifn_atomic_compare_exchange (gcall *); extern void expand_ifn_atomic_op_fetch_cmp_0 (gcall *); +extern void expand_ifn_atomic_fetch_minmax (gcall *); extern rtx expand_builtin_crc_table_based (internal_fn, scalar_mode, scalar_mode, machine_mode, tree, rtx); diff --git a/gcc/expr.cc b/gcc/expr.cc index 70b4eda6df3..0226dadcd6b 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -9831,7 +9831,6 @@ expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode, enum expand_modifier modifier) { rtx op0, op1, op2, temp; - rtx_code_label *lab; tree type; int unsignedp; machine_mode mode; @@ -10634,92 +10633,7 @@ expand_expr_real_2 (const_sepops ops, rtx target, machine_mode tmode, if (temp != 0) return temp; - if (VECTOR_TYPE_P (type)) - gcc_unreachable (); - - /* At this point, a MEM target is no longer useful; we will get better - code without it. */ - - if (! REG_P (target)) - target = gen_reg_rtx (mode); - - /* If op1 was placed in target, swap op0 and op1. */ - if (target != op0 && target == op1) - std::swap (op0, op1); - - /* We generate better code and avoid problems with op1 mentioning - target by forcing op1 into a pseudo if it isn't a constant. */ - if (! CONSTANT_P (op1)) - op1 = force_reg (mode, op1); - - { - enum rtx_code comparison_code; - rtx cmpop1 = op1; - - if (code == MAX_EXPR) - comparison_code = unsignedp ? GEU : GE; - else - comparison_code = unsignedp ? LEU : LE; - - /* Canonicalize to comparisons against 0. */ - if (op1 == const1_rtx) - { - /* Converting (a >= 1 ? a : 1) into (a > 0 ? a : 1) - or (a != 0 ? a : 1) for unsigned. - For MIN we are safe converting (a <= 1 ? a : 1) - into (a <= 0 ? a : 1) */ - cmpop1 = const0_rtx; - if (code == MAX_EXPR) - comparison_code = unsignedp ? NE : GT; - } - if (op1 == constm1_rtx && !unsignedp) - { - /* Converting (a >= -1 ? a : -1) into (a >= 0 ? a : -1) - and (a <= -1 ? a : -1) into (a < 0 ? a : -1) */ - cmpop1 = const0_rtx; - if (code == MIN_EXPR) - comparison_code = LT; - } - - /* Use a conditional move if possible. */ - if (can_conditionally_move_p (mode)) - { - rtx insn; - - start_sequence (); - - /* Try to emit the conditional move. */ - insn = emit_conditional_move (target, - { comparison_code, - op0, cmpop1, mode }, - op0, op1, mode, - unsignedp); - - /* If we could do the conditional move, emit the sequence, - and return. */ - if (insn) - { - rtx_insn *seq = end_sequence (); - emit_insn (seq); - return target; - } - - /* Otherwise discard the sequence and fall back to code with - branches. */ - end_sequence (); - } - - if (target != op0) - emit_move_insn (target, op0); - - lab = gen_label_rtx (); - do_compare_rtx_and_jump (target, cmpop1, comparison_code, - unsignedp, mode, NULL_RTX, NULL, lab, - profile_probability::uninitialized ()); - } - emit_move_insn (target, op1); - emit_label (lab); - return target; + gcc_unreachable (); case BIT_NOT_EXPR: op0 = expand_expr (treeop0, subtarget, diff --git a/gcc/ifn-atomic-minmax-lowering.cc b/gcc/ifn-atomic-minmax-lowering.cc index 8f58bf235d6..5a4811e662d 100644 --- a/gcc/ifn-atomic-minmax-lowering.cc +++ b/gcc/ifn-atomic-minmax-lowering.cc @@ -49,6 +49,7 @@ This pass transforms the IFN to the following pattern: #include "coretypes.h" #include "backend.h" #include "target.h" +#include "rtl.h" #include "tree.h" #include "gimple.h" #include "gimple-iterator.h" @@ -57,6 +58,7 @@ This pass transforms the IFN to the following pattern: #include "memmodel.h" #include "tm_p.h" #include "ssa.h" +#include "optabs.h" #include "optabs-query.h" #include "tree-ssanames.h" #include "fold-const.h" @@ -92,6 +94,30 @@ insert_type_conversion (gimple_stmt_iterator *gsi, tree src_value, return result; } +/* Check if optab is available for the atomic min/max operation. */ + +static bool +can_expand_atomic_minmax (gcall *call_stmt) +{ + tree datatype = TREE_TYPE (TREE_TYPE (gimple_call_arg (call_stmt, 0))); + bool is_min = tree_to_uhwi (gimple_call_arg (call_stmt, 3)); + bool is_fetch_op = tree_to_uhwi (gimple_call_arg (call_stmt, 4)); + + machine_mode mode = TYPE_MODE (datatype); + bool is_signed = !TYPE_UNSIGNED (datatype); + + optab op_optab + = is_signed + ? (is_fetch_op + ? (is_min ? atomic_fetch_smin_optab : atomic_fetch_smax_optab) + : (is_min ? atomic_smin_fetch_optab : atomic_smax_fetch_optab)) + : (is_fetch_op + ? (is_min ? atomic_fetch_umin_optab : atomic_fetch_umax_optab) + : (is_min ? atomic_umin_fetch_optab : atomic_umax_fetch_optab)); + + return optab_handler (op_optab, mode) != CODE_FOR_nothing; +} + /* Lower a single IFN_ATOMIC_FETCH_MINMAX call to a CAS loop. */ static void diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 235c42ed413..ac9bf11df02 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3697,15 +3697,9 @@ expand_ATOMIC_XOR_FETCH_CMP_0 (internal_fn, gcall *call) /* Expand atomic fetch minmax. */ static void -expand_ATOMIC_FETCH_MINMAX (internal_fn, gcall *stmt) -{ - /* This should only be reached if the GIMPLE pass determined that - an optab exists for this operation. - - TODO: Implement this. */ - - /* Suppress unused parameter warnings for now */ - (void) stmt; +expand_ATOMIC_FETCH_MINMAX (internal_fn, gcall *call) +{ + expand_ifn_atomic_fetch_minmax (call); } /* Expand LAUNDER to assignment, lhs = arg0. */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index ad556bf78fe..fe241659b82 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1649,6 +1649,93 @@ expand_binop (machine_mode mode, optab binoptab, rtx op0, rtx op1, } } } + /* Implement MIN/MAX expansion as a compare and branch/conditional move. */ + if ((binoptab == smin_optab || binoptab == smax_optab + || binoptab == umin_optab || binoptab == umax_optab) + && !VECTOR_MODE_P (mode) && methods != OPTAB_DIRECT) + { + bool is_max = (binoptab == smax_optab || binoptab == umax_optab); + rtx_code_label *lab; + + if (target == NULL_RTX || !REG_P (target)) + target = gen_reg_rtx (mode); + + /* If op1 was placed in target, swap op0 and op1. */ + if (target != op0 && target == op1) + std::swap (op0, op1); + + /* We generate better code and avoid problems with op1 mentioning + target by forcing op1 into a pseudo if it isn't a constant. */ + if (!CONSTANT_P (op1)) + op1 = force_reg (mode, op1); + + { + enum rtx_code comparison_code; + rtx cmpop1 = op1; + + if (is_max) + comparison_code = unsignedp ? GEU : GE; + else + comparison_code = unsignedp ? LEU : LE; + + /* Canonicalize to comparisons against 0. */ + if (op1 == const1_rtx) + { + /* Converting (a >= 1 ? a : 1) into (a > 0 ? a : 1) + or (a != 0 ? a : 1) for unsigned. + For MIN we are safe converting (a <= 1 ? a : 1) + into (a <= 0 ? a : 1) */ + cmpop1 = const0_rtx; + if (is_max) + comparison_code = unsignedp ? NE : GT; + } + if (op1 == constm1_rtx && !unsignedp) + { + /* Converting (a >= -1 ? a : -1) into (a >= 0 ? a : -1) + and (a <= -1 ? a : -1) into (a < 0 ? a : -1) */ + cmpop1 = const0_rtx; + if (!is_max) + comparison_code = LT; + } + + /* Use a conditional move if possible. */ + if (can_conditionally_move_p (mode)) + { + rtx insn; + + start_sequence (); + + /* Try to emit the conditional move. */ + insn = emit_conditional_move (target, + {comparison_code, op0, cmpop1, mode}, + op0, op1, mode, unsignedp); + + /* If we could do the conditional move, emit the sequence, + and return. */ + if (insn) + { + rtx_insn *seq = end_sequence (); + emit_insn (seq); + return target; + } + + /* Otherwise discard the sequence and fall back to code with + branches. */ + end_sequence (); + } + + if (target != op0) + emit_move_insn (target, op0); + + lab = gen_label_rtx (); + do_compare_rtx_and_jump (target, cmpop1, comparison_code, unsignedp, + mode, NULL_RTX, NULL, lab, + profile_probability::uninitialized ()); + } + emit_move_insn (target, op1); + emit_label (lab); + return target; + } /* Look for a wider mode of the same class for which we think we can open-code the operation. Check for a widening multiply at the @@ -8020,6 +8107,11 @@ expand_atomic_fetch_op_no_fallback (rtx target, rtx mem, rtx val, true, OPTAB_LIB_WIDEN); result = expand_simple_unop (mode, NOT, result, target, true); } + else if (code == SMIN || code == SMAX) + { + result = expand_simple_binop (mode, code, result, val, target, + false, OPTAB_LIB_WIDEN); + } else result = expand_simple_binop (mode, code, result, val, target, true, OPTAB_LIB_WIDEN); @@ -8146,6 +8238,11 @@ expand_atomic_fetch_op (rtx target, rtx mem, rtx val, enum rtx_code code, true, OPTAB_LIB_WIDEN); t1 = expand_simple_unop (mode, code, t1, NULL_RTX, true); } + else if (code == SMIN || code == SMAX) + { + t1 = expand_simple_binop (mode, code, t1, val, NULL_RTX, + false, OPTAB_LIB_WIDEN); + } else t1 = expand_simple_binop (mode, code, t1, val, NULL_RTX, true, OPTAB_LIB_WIDEN); -- 2.43.0
