This patch introduces two new vector cbranch optabs vec_cbranch_any and vec_cbranch_all.
To explain why we need two new optabs let me explain the current cbranch and its limitations and what I'm trying to optimize. So sorry for the long email, but I hope it explains why I think we want new optabs. Today cbranch can be used for both vector and scalar modes. In both these cases it's intended to compare boolean values, either scalar or vector. The optab documentation does not however state that it can only handle comparisons against 0. So many targets have added code for the vector variant that tries to deal with the case where we branch based on two non-zero registers. However this code can't ever be reached because the cbranch expansion only deals with comparisons against 0 for vectors. This is because for vectors the rest of the compiler has no way to generate a non-zero comparison. e.g. the vectorizer will always generate a zero comparison, and the C/C++ front-ends won't allow vectors to be used in a cbranch as it expects a boolean value. ISAs like SVE work around this by requiring you to use an SVE PTEST intrinsics which results in a single scalar boolean value that represents the flag values. e.g. if (svptest_any (..)) The natural question is why do we not at expand time then rewrite the comparison to a non-zero comparison if the target supports it. The reason is we can't safely do so. For an ANY comparison (e.g. != b) this is trivial, but for an ALL comparison (e.g. == b) we would have to flip both branch and invert the value being compared. i.e. we have to make it a != b comparison. But in emit_cmp_and_jump_insns we can't flip the branches anymore because they have already been lowered into a fall through branch (PC) and a label, ready for use in an if_then_else RTL expression. Additionally as mentioned before, cbranches expect the values to be masks, not values. This kinda works out if you XOR the values, but for FP vectors you'd need to know what equality means for the FP format. i.e. it's possible for IEEE 754 values but It's not immediately obvious if this is true for all formats. Now why does any of this matter? Well there are two optimizations we want to be able to do. 1. Adv. SIMD does not support a vector !=, as in there's no instruction for it. For both Integer and FP vectors we perform the comparisons as EQ and then invert the resulting mask. Ideally we'd like to replace this with just a XOR and the appropriate branch. 2. When on an SVE enabled system we would like to use an SVE compare + branch for the Adv. SIMD sequence which could happen due to cost modelling. However we can only do so based on if we know that the values being compared against are the boolean masks. This means we can't really use combine to do this because combine would have to match the entire sequence including the vector comparisons because at RTL we've lost the information that VECTOR_BOOLEAN_P would have given us. This sequence would be too long for combine to match due to it having to match the compare + branch sequence being generated as well. It also becomes a bit messy to match ANY and ALL sequences. To handle these two cases I propose the new optabs vec_cbranch_any and vec_cbranch_all that expect the operands to be values, not masks, and the comparison operation is the comparison being performed. The current cbranch optab can't be used here because we need to be able to see both comparison operators (for the boolean branch and the data compare branch). The initial != 0 or == 0 is folded away into the name as _any and _all allowing the target to easily do as it wishes. I have intentionally chosen to require cbranch_optab to still be the main one. i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch because these two are an expand time only construct. I've also chosen them to be this such that there's no change needed to any other passes in the middle-end. With these two optabs it's trivial to implement the two optimization I described above. A target expansion hook is also possible but optabs felt more natural for the situation. I.e. with them we can now generate .L2: ldr q31, [x1, x2] add v29.4s, v29.4s, v25.4s add v28.4s, v28.4s, v26.4s add v31.4s, v31.4s, v30.4s str q31, [x1, x2] add x1, x1, 16 cmp x1, 2560 beq .L1 .L6: ldr q30, [x3, x1] cmpeq p15.s, p7/z, z30.s, z27.s b.none .L2 and easily prove it correct. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/118974 * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for instead of hardcoded cbranch. (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported. * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New. * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4): Document them. --- diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the @code{code_label} to jump to. +@cindex @code{cbranch_any@var{mode}4} instruction pattern +@item @samp{cbranch_any@var{mode}4} +Conditional branch instruction combined with a compare instruction on vectors +where it is required that at least one of the elementwise comparisons of the +two input vectors is true. +Operand 0 is a comparison operator. Operand 1 and operand 2 are the +first and second operands of the comparison, respectively. Operand 3 +is the @code{code_label} to jump to. + +@cindex @code{cbranch_all@var{mode}4} instruction pattern +@item @samp{cbranch_all@var{mode}4} +Conditional branch instruction combined with a compare instruction on vectors +where it is required that at all of the elementwise comparisons of the +two input vectors are true. +Operand 0 is a comparison operator. Operand 1 and operand 2 are the +first and second operands of the comparison, respectively. Operand 3 +is the @code{code_label} to jump to. + @cindex @code{jump} instruction pattern @item @samp{jump} A jump inside a function; an unconditional branch. Operand 0 is the diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode) *PMODE is the mode of the inputs (in case they are const_int). + *OPTAB is the optab to check for OPTAB_DIRECT support. Defaults to + cbranch_optab. + This function performs all the setup necessary so that the caller only has to emit a single comparison insn. This setup can involve doing a BLKmode comparison or emitting a library call to perform the comparison if no insn @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode) static void prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, int unsignedp, enum optab_methods methods, - rtx *ptest, machine_mode *pmode) + rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab) { machine_mode mode = *pmode; rtx libfunc, test; @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode) { enum insn_code icode; - icode = optab_handler (cbranch_optab, cmp_mode); + icode = optab_handler (optab, cmp_mode); if (icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, test)) { @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, if (comparison == UNORDERED && rtx_equal_p (x, y)) { prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN, - ptest, pmode); + ptest, pmode, optab); if (*ptest) return; } @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, *pmode = ret_mode; prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods, - ptest, pmode); + ptest, pmode, optab); } return; @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size, the target supports tbranch. */ machine_mode tmode = mode; direct_optab optab; - if (op1 == CONST0_RTX (GET_MODE (op1)) - && validate_test_and_branch (val, &test, &tmode, - &optab) != CODE_FOR_nothing) + if (op1 == CONST0_RTX (GET_MODE (op1))) { - emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); - return; + if (validate_test_and_branch (val, &test, &tmode, + &optab) != CODE_FOR_nothing) + { + emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); + return; + } + + /* If we are comparing equality with 0, check if VAL is another equality + comparison and if the target supports it directly. */ + if (val && TREE_CODE (val) == SSA_NAME + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val)) + && (comparison == NE || comparison == EQ)) + { + auto def_stmt = SSA_NAME_DEF_STMT (val); + enum insn_code icode; + if (is_gimple_assign (def_stmt) + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) + == tcc_comparison) + { + class expand_operand ops[2]; + rtx_insn *tmp = NULL; + start_sequence (); + rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt)); + rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt)); + machine_mode mode2 = GET_MODE (op0c); + create_input_operand (&ops[0], op0c, mode2); + create_input_operand (&ops[1], op1c, mode2); + + int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val)); + auto inner_code = gimple_assign_rhs_code (def_stmt); + rtx test2 = NULL_RTX; + + enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2); + if (unsignedp2) + comparison2 = unsigned_condition (comparison2); + if (comparison == NE) + optab = vec_cbranch_any_optab; + else + optab = vec_cbranch_all_optab; + + if ((icode = optab_handler (optab, mode2)) + != CODE_FOR_nothing + && maybe_legitimize_operands (icode, 1, 2, ops)) + { + prepare_cmp_insn (ops[0].value, ops[1].value, comparison2, + size, unsignedp2, OPTAB_DIRECT, &test2, + &mode2, optab); + emit_cmp_and_jump_insn_1 (test2, mode2, label, + optab, prob, false); + tmp = get_insns (); + } + + end_sequence (); + if (tmp) + { + emit_insn (tmp); + return; + } + } + } } emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false); diff --git a/gcc/optabs.def b/gcc/optabs.def index 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4") +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") --
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator. Operand 1 and operand 2 are the first and second operands of the comparison, respectively. Operand 3 is the @code{code_label} to jump to. +@cindex @code{cbranch_any@var{mode}4} instruction pattern +@item @samp{cbranch_any@var{mode}4} +Conditional branch instruction combined with a compare instruction on vectors +where it is required that at least one of the elementwise comparisons of the +two input vectors is true. +Operand 0 is a comparison operator. Operand 1 and operand 2 are the +first and second operands of the comparison, respectively. Operand 3 +is the @code{code_label} to jump to. + +@cindex @code{cbranch_all@var{mode}4} instruction pattern +@item @samp{cbranch_all@var{mode}4} +Conditional branch instruction combined with a compare instruction on vectors +where it is required that at all of the elementwise comparisons of the +two input vectors are true. +Operand 0 is a comparison operator. Operand 1 and operand 2 are the +first and second operands of the comparison, respectively. Operand 3 +is the @code{code_label} to jump to. + @cindex @code{jump} instruction pattern @item @samp{jump} A jump inside a function; an unconditional branch. Operand 0 is the diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode) *PMODE is the mode of the inputs (in case they are const_int). + *OPTAB is the optab to check for OPTAB_DIRECT support. Defaults to + cbranch_optab. + This function performs all the setup necessary so that the caller only has to emit a single comparison insn. This setup can involve doing a BLKmode comparison or emitting a library call to perform the comparison if no insn @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode) static void prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, int unsignedp, enum optab_methods methods, - rtx *ptest, machine_mode *pmode) + rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab) { machine_mode mode = *pmode; rtx libfunc, test; @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode) { enum insn_code icode; - icode = optab_handler (cbranch_optab, cmp_mode); + icode = optab_handler (optab, cmp_mode); if (icode != CODE_FOR_nothing && insn_operand_matches (icode, 0, test)) { @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, if (comparison == UNORDERED && rtx_equal_p (x, y)) { prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN, - ptest, pmode); + ptest, pmode, optab); if (*ptest) return; } @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, *pmode = ret_mode; prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods, - ptest, pmode); + ptest, pmode, optab); } return; @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size, the target supports tbranch. */ machine_mode tmode = mode; direct_optab optab; - if (op1 == CONST0_RTX (GET_MODE (op1)) - && validate_test_and_branch (val, &test, &tmode, - &optab) != CODE_FOR_nothing) + if (op1 == CONST0_RTX (GET_MODE (op1))) { - emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); - return; + if (validate_test_and_branch (val, &test, &tmode, + &optab) != CODE_FOR_nothing) + { + emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); + return; + } + + /* If we are comparing equality with 0, check if VAL is another equality + comparison and if the target supports it directly. */ + if (val && TREE_CODE (val) == SSA_NAME + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val)) + && (comparison == NE || comparison == EQ)) + { + auto def_stmt = SSA_NAME_DEF_STMT (val); + enum insn_code icode; + if (is_gimple_assign (def_stmt) + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) + == tcc_comparison) + { + class expand_operand ops[2]; + rtx_insn *tmp = NULL; + start_sequence (); + rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt)); + rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt)); + machine_mode mode2 = GET_MODE (op0c); + create_input_operand (&ops[0], op0c, mode2); + create_input_operand (&ops[1], op1c, mode2); + + int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val)); + auto inner_code = gimple_assign_rhs_code (def_stmt); + rtx test2 = NULL_RTX; + + enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2); + if (unsignedp2) + comparison2 = unsigned_condition (comparison2); + if (comparison == NE) + optab = vec_cbranch_any_optab; + else + optab = vec_cbranch_all_optab; + + if ((icode = optab_handler (optab, mode2)) + != CODE_FOR_nothing + && maybe_legitimize_operands (icode, 1, 2, ops)) + { + prepare_cmp_insn (ops[0].value, ops[1].value, comparison2, + size, unsignedp2, OPTAB_DIRECT, &test2, + &mode2, optab); + emit_cmp_and_jump_insn_1 (test2, mode2, label, + optab, prob, false); + tmp = get_insns (); + } + + end_sequence (); + if (tmp) + { + emit_insn (tmp); + return; + } + } + } } emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false); diff --git a/gcc/optabs.def b/gcc/optabs.def index 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") OPTAB_D (umulhs_optab, "umulhs$a3") OPTAB_D (umulhrs_optab, "umulhrs$a3") OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4") +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")