On Mon, 9 Jun 2025, Tamar Christina wrote: > This patch introduces two new vector cbranch optabs vec_cbranch_any and > vec_cbranch_all. > > To explain why we need two new optabs let me explain the current cbranch and > its > limitations and what I'm trying to optimize. So sorry for the long email, but > I > hope it explains why I think we want new optabs. > > Today cbranch can be used for both vector and scalar modes. In both these > cases it's intended to compare boolean values, either scalar or vector. > > The optab documentation does not however state that it can only handle > comparisons against 0. So many targets have added code for the vector variant > that tries to deal with the case where we branch based on two non-zero > registers. > > However this code can't ever be reached because the cbranch expansion only > deals > with comparisons against 0 for vectors. This is because for vectors the rest > of > the compiler has no way to generate a non-zero comparison. e.g. the vectorizer > will always generate a zero comparison, and the C/C++ front-ends won't allow > vectors to be used in a cbranch as it expects a boolean value. ISAs like SVE > work around this by requiring you to use an SVE PTEST intrinsics which results > in a single scalar boolean value that represents the flag values. > > e.g. if (svptest_any (..)) > > The natural question is why do we not at expand time then rewrite the > comparison > to a non-zero comparison if the target supports it. > > The reason is we can't safely do so. For an ANY comparison (e.g. != b) this > is > trivial, but for an ALL comparison (e.g. == b) we would have to flip both > branch > and invert the value being compared. i.e. we have to make it a != b > comparison. > > But in emit_cmp_and_jump_insns we can't flip the branches anymore because they > have already been lowered into a fall through branch (PC) and a label, ready > for > use in an if_then_else RTL expression. > > Additionally as mentioned before, cbranches expect the values to be masks, not > values. This kinda works out if you XOR the values, but for FP vectors you'd > need to know what equality means for the FP format. i.e. it's possible for > IEEE 754 values but It's not immediately obvious if this is true for all > formats. > > Now why does any of this matter? Well there are two optimizations we want to > be > able to do. > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction for > it. > For both Integer and FP vectors we perform the comparisons as EQ and then > invert the resulting mask. Ideally we'd like to replace this with just a > XOR > and the appropriate branch. > > 2. When on an SVE enabled system we would like to use an SVE compare + branch > for the Adv. SIMD sequence which could happen due to cost modelling. > However > we can only do so based on if we know that the values being compared > against > are the boolean masks. This means we can't really use combine to do this > because combine would have to match the entire sequence including the > vector comparisons because at RTL we've lost the information that > VECTOR_BOOLEAN_P would have given us. This sequence would be too long for > combine to match due to it having to match the compare + branch sequence > being generated as well. It also becomes a bit messy to match ANY and ALL > sequences.
I guess I didn't really understand the above, esp. why all of this is not recoverable on RTL, so I'll state what I think cbranch is supposed to do and then ask some question on the proposal. cbranch is supposed to branch on the result of a comparison, being it on vector or scalar operands, the result of the comparison is a scalar, or rather in full generality, a (scalar) condition-code. Given it uses a COMPARE RTX as comparison operator we can currently only handle whole vector equality/inequality which map to all/any. > To handle these two cases I propose the new optabs vec_cbranch_any and > vec_cbranch_all that expect the operands to be values, not masks, and the > comparison operation is the comparison being performed. The current cbranch > optab can't be used here because we need to be able to see both comparison > operators (for the boolean branch and the data compare branch). > > The initial != 0 or == 0 is folded away into the name as _any and _all > allowing > the target to easily do as it wishes. > > I have intentionally chosen to require cbranch_optab to still be the main one. > i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch because > these two are an expand time only construct. I've also chosen them to be this > such that there's no change needed to any other passes in the middle-end. > > With these two optabs it's trivial to implement the two optimization I > described > above. A target expansion hook is also possible but optabs felt more natural > for the situation. What's the RTL the optabs expand to for SVE/AdvSIMD? How do you represent the any/all vector compare difference? Looking at 3/3 I see you use UNSPECs. What kind of compares do you expect us to generate? Any besides equality compares? Why would we not allow cbranch to take an operand 0 that specifies the desired reduction? We could have (any:CC (eq:V4BI (reg:V4SI) (reg:V4SI))) or so, or alternatively (more intrusive) have (any_eq:CC (reg:V4SI) (reg:V4SI))? That would also allow you to have a proper RTL representation rather than more and more UNSEPCs. Richard. > I.e. with them we can now generate > > .L2: > ldr q31, [x1, x2] > add v29.4s, v29.4s, v25.4s > add v28.4s, v28.4s, v26.4s > add v31.4s, v31.4s, v30.4s > str q31, [x1, x2] > add x1, x1, 16 > cmp x1, 2560 > beq .L1 > .L6: > ldr q30, [x3, x1] > cmpeq p15.s, p7/z, z30.s, z27.s > b.none .L2 > > and easily prove it correct. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/118974 > * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for > instead of hardcoded cbranch. > (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported. > * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New. > * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4): Document > them. > > --- > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index > f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24 > 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator. Operand 1 and > operand 2 are the > first and second operands of the comparison, respectively. Operand 3 > is the @code{code_label} to jump to. > > +@cindex @code{cbranch_any@var{mode}4} instruction pattern > +@item @samp{cbranch_any@var{mode}4} > +Conditional branch instruction combined with a compare instruction on vectors > +where it is required that at least one of the elementwise comparisons of the > +two input vectors is true. > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the > +first and second operands of the comparison, respectively. Operand 3 > +is the @code{code_label} to jump to. > + > +@cindex @code{cbranch_all@var{mode}4} instruction pattern > +@item @samp{cbranch_all@var{mode}4} > +Conditional branch instruction combined with a compare instruction on vectors > +where it is required that at all of the elementwise comparisons of the > +two input vectors are true. > +Operand 0 is a comparison operator. Operand 1 and operand 2 are the > +first and second operands of the comparison, respectively. Operand 3 > +is the @code{code_label} to jump to. > + > @cindex @code{jump} instruction pattern > @item @samp{jump} > A jump inside a function; an unconditional branch. Operand 0 is the > diff --git a/gcc/optabs.cc b/gcc/optabs.cc > index > 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67 > 100644 > --- a/gcc/optabs.cc > +++ b/gcc/optabs.cc > @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, > machine_mode extr_mode) > > *PMODE is the mode of the inputs (in case they are const_int). > > + *OPTAB is the optab to check for OPTAB_DIRECT support. Defaults to > + cbranch_optab. > + > This function performs all the setup necessary so that the caller only has > to emit a single comparison insn. This setup can involve doing a BLKmode > comparison or emitting a library call to perform the comparison if no insn > @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, > machine_mode extr_mode) > static void > prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size, > int unsignedp, enum optab_methods methods, > - rtx *ptest, machine_mode *pmode) > + rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab) > { > machine_mode mode = *pmode; > rtx libfunc, test; > @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code > comparison, rtx size, > FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode) > { > enum insn_code icode; > - icode = optab_handler (cbranch_optab, cmp_mode); > + icode = optab_handler (optab, cmp_mode); > if (icode != CODE_FOR_nothing > && insn_operand_matches (icode, 0, test)) > { > @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code > comparison, rtx size, > if (comparison == UNORDERED && rtx_equal_p (x, y)) > { > prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN, > - ptest, pmode); > + ptest, pmode, optab); > if (*ptest) > return; > } > @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code > comparison, rtx size, > > *pmode = ret_mode; > prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods, > - ptest, pmode); > + ptest, pmode, optab); > } > > return; > @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code > comparison, rtx size, > the target supports tbranch. */ > machine_mode tmode = mode; > direct_optab optab; > - if (op1 == CONST0_RTX (GET_MODE (op1)) > - && validate_test_and_branch (val, &test, &tmode, > - &optab) != CODE_FOR_nothing) > + if (op1 == CONST0_RTX (GET_MODE (op1))) > { > - emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); > - return; > + if (validate_test_and_branch (val, &test, &tmode, > + &optab) != CODE_FOR_nothing) > + { > + emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true); > + return; > + } > + > + /* If we are comparing equality with 0, check if VAL is another > equality > + comparison and if the target supports it directly. */ > + if (val && TREE_CODE (val) == SSA_NAME > + && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val)) > + && (comparison == NE || comparison == EQ)) > + { > + auto def_stmt = SSA_NAME_DEF_STMT (val); > + enum insn_code icode; > + if (is_gimple_assign (def_stmt) > + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) > + == tcc_comparison) > + { > + class expand_operand ops[2]; > + rtx_insn *tmp = NULL; > + start_sequence (); > + rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt)); > + rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt)); > + machine_mode mode2 = GET_MODE (op0c); > + create_input_operand (&ops[0], op0c, mode2); > + create_input_operand (&ops[1], op1c, mode2); > + > + int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val)); > + auto inner_code = gimple_assign_rhs_code (def_stmt); > + rtx test2 = NULL_RTX; > + > + enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2); > + if (unsignedp2) > + comparison2 = unsigned_condition (comparison2); > + if (comparison == NE) > + optab = vec_cbranch_any_optab; > + else > + optab = vec_cbranch_all_optab; > + > + if ((icode = optab_handler (optab, mode2)) > + != CODE_FOR_nothing > + && maybe_legitimize_operands (icode, 1, 2, ops)) > + { > + prepare_cmp_insn (ops[0].value, ops[1].value, comparison2, > + size, unsignedp2, OPTAB_DIRECT, &test2, > + &mode2, optab); > + emit_cmp_and_jump_insn_1 (test2, mode2, label, > + optab, prob, false); > + tmp = get_insns (); > + } > + > + end_sequence (); > + if (tmp) > + { > + emit_insn (tmp); > + return; > + } > + } > + } > } > > emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false); > diff --git a/gcc/optabs.def b/gcc/optabs.def > index > 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb > 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3") > OPTAB_D (umulhs_optab, "umulhs$a3") > OPTAB_D (umulhrs_optab, "umulhrs$a3") > OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3") > +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4") > +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4") > OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") > OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a") > OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a") > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)