On Mon, 9 Jun 2025, Tamar Christina wrote:

> This patch introduces two new vector cbranch optabs vec_cbranch_any and
> vec_cbranch_all.
> 
> To explain why we need two new optabs let me explain the current cbranch and 
> its
> limitations and what I'm trying to optimize. So sorry for the long email, but 
> I
> hope it explains why I think we want new optabs.
> 
> Today cbranch can be used for both vector and scalar modes.  In both these
> cases it's intended to compare boolean values, either scalar or vector.
> 
> The optab documentation does not however state that it can only handle
> comparisons against 0.  So many targets have added code for the vector variant
> that tries to deal with the case where we branch based on two non-zero
> registers.
> 
> However this code can't ever be reached because the cbranch expansion only 
> deals
> with comparisons against 0 for vectors.  This is because for vectors the rest 
> of
> the compiler has no way to generate a non-zero comparison. e.g. the vectorizer
> will always generate a zero comparison, and the C/C++ front-ends won't allow
> vectors to be used in a cbranch as it expects a boolean value.  ISAs like SVE
> work around this by requiring you to use an SVE PTEST intrinsics which results
> in a single scalar boolean value that represents the flag values.
> 
> e.g. if (svptest_any (..))
> 
> The natural question is why do we not at expand time then rewrite the 
> comparison
> to a non-zero comparison if the target supports it.
> 
> The reason is we can't safely do so.  For an ANY comparison (e.g. != b) this 
> is
> trivial, but for an ALL comparison (e.g. == b) we would have to flip both 
> branch
> and invert the value being compared.  i.e. we have to make it a != b 
> comparison.
> 
> But in emit_cmp_and_jump_insns we can't flip the branches anymore because they
> have already been lowered into a fall through branch (PC) and a label, ready 
> for
> use in an if_then_else RTL expression.
> 
> Additionally as mentioned before, cbranches expect the values to be masks, not
> values.  This kinda works out if you XOR the values, but for FP vectors you'd
> need to know what equality means for the FP format.  i.e. it's possible for
> IEEE 754 values but It's not immediately obvious if this is true for all
> formats.
> 
> Now why does any of this matter?  Well there are two optimizations we want to 
> be
> able to do.
> 
> 1. Adv. SIMD does not support a vector !=, as in there's no instruction for 
> it.
>    For both Integer and FP vectors we perform the comparisons as EQ and then
>    invert the resulting mask.  Ideally we'd like to replace this with just a 
> XOR
>    and the appropriate branch.
> 
> 2. When on an SVE enabled system we would like to use an SVE compare + branch
>    for the Adv. SIMD sequence which could happen due to cost modelling.  
> However
>    we can only do so based on if we know that the values being compared 
> against
>    are the boolean masks.  This means we can't really use combine to do this
>    because combine would have to match the entire sequence including the
>    vector comparisons because at RTL we've lost the information that
>    VECTOR_BOOLEAN_P would have given us.  This sequence would be too long for
>    combine to match due to it having to match the compare + branch sequence
>    being generated as well.  It also becomes a bit messy to match ANY and ALL
>    sequences.

I guess I didn't really understand the above, esp. why all of this is
not recoverable on RTL, so I'll state what I think
cbranch is supposed to do and then ask some question on the proposal.

cbranch is supposed to branch on the result of a comparison,
being it on vector or scalar operands, the result of the comparison is
a scalar, or rather in full generality, a (scalar) condition-code.
Given it uses a COMPARE RTX as comparison operator we can currently
only handle whole vector equality/inequality which map to all/any.

> To handle these two cases I propose the new optabs vec_cbranch_any and
> vec_cbranch_all that expect the operands to be values, not masks, and the
> comparison operation is the comparison being performed.  The current cbranch
> optab can't be used here because we need to be able to see both comparison
> operators (for the boolean branch and the data compare branch).
> 
> The initial != 0 or == 0 is folded away into the name as _any and _all 
> allowing
> the target to easily do as it wishes.
> 
> I have intentionally chosen to require cbranch_optab to still be the main one.
> i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch because
> these two are an expand time only construct.  I've also chosen them to be this
> such that there's no change needed to any other passes in the middle-end.
> 
> With these two optabs it's trivial to implement the two optimization I 
> described
> above.  A target expansion hook is also possible but optabs felt more natural
> for the situation.

What's the RTL the optabs expand to for SVE/AdvSIMD?  How do you
represent the any/all vector compare difference?  Looking at 3/3 I see
you use UNSPECs.

What kind of compares do you expect us to generate?  Any besides
equality compares?  Why would we not allow cbranch to take an
operand 0 that specifies the desired reduction?  We could
have (any:CC (eq:V4BI (reg:V4SI) (reg:V4SI))) or so, or alternatively
(more intrusive) have (any_eq:CC (reg:V4SI) (reg:V4SI))?  That
would also allow you to have a proper RTL representation rather
than more and more UNSEPCs.

Richard.

> I.e. with them we can now generate
> 
> .L2:
>         ldr     q31, [x1, x2]
>         add     v29.4s, v29.4s, v25.4s
>         add     v28.4s, v28.4s, v26.4s
>         add     v31.4s, v31.4s, v30.4s
>         str     q31, [x1, x2]
>         add     x1, x1, 16
>         cmp     x1, 2560
>         beq     .L1
> .L6:
>         ldr     q30, [x3, x1]
>         cmpeq   p15.s, p7/z, z30.s, z27.s
>         b.none  .L2
> 
> and easily prove it correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       PR target/118974
>       * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
>       instead of hardcoded cbranch.
>       (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported.
>       * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
>       * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4): Document
>       them.
> 
> ---
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1 and 
> operand 2 are the
>  first and second operands of the comparison, respectively.  Operand 3
>  is the @code{code_label} to jump to.
>  
> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> +@item @samp{cbranch_any@var{mode}4}
> +Conditional branch instruction combined with a compare instruction on vectors
> +where it is required that at least one of the elementwise comparisons of the
> +two input vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> +
> +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> +@item @samp{cbranch_all@var{mode}4}
> +Conditional branch instruction combined with a compare instruction on vectors
> +where it is required that at all of the elementwise comparisons of the
> +two input vectors are true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> +
>  @cindex @code{jump} instruction pattern
>  @item @samp{jump}
>  A jump inside a function; an unconditional branch.  Operand 0 is the
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index 
> 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67
>  100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
> machine_mode extr_mode)
>  
>     *PMODE is the mode of the inputs (in case they are const_int).
>  
> +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> +   cbranch_optab.
> +
>     This function performs all the setup necessary so that the caller only has
>     to emit a single comparison insn.  This setup can involve doing a BLKmode
>     comparison or emitting a library call to perform the comparison if no insn
> @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
> machine_mode extr_mode)
>  static void
>  prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
>                 int unsignedp, enum optab_methods methods,
> -               rtx *ptest, machine_mode *pmode)
> +               rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab)
>  {
>    machine_mode mode = *pmode;
>    rtx libfunc, test;
> @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>    FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
>      {
>        enum insn_code icode;
> -      icode = optab_handler (cbranch_optab, cmp_mode);
> +      icode = optab_handler (optab, cmp_mode);
>        if (icode != CODE_FOR_nothing
>         && insn_operand_matches (icode, 0, test))
>       {
> @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>        if (comparison == UNORDERED && rtx_equal_p (x, y))
>       {
>         prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> -                         ptest, pmode);
> +                         ptest, pmode, optab);
>         if (*ptest)
>           return;
>       }
> @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>  
>        *pmode = ret_mode;
>        prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> -                     ptest, pmode);
> +                     ptest, pmode, optab);
>      }
>  
>    return;
> @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code 
> comparison, rtx size,
>       the target supports tbranch.  */
>    machine_mode tmode = mode;
>    direct_optab optab;
> -  if (op1 == CONST0_RTX (GET_MODE (op1))
> -      && validate_test_and_branch (val, &test, &tmode,
> -                                &optab) != CODE_FOR_nothing)
> +  if (op1 == CONST0_RTX (GET_MODE (op1)))
>      {
> -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> -      return;
> +      if (validate_test_and_branch (val, &test, &tmode,
> +                                 &optab) != CODE_FOR_nothing)
> +     {
> +       emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> +       return;
> +     }
> +
> +      /* If we are comparing equality with 0, check if VAL is another 
> equality
> +      comparison and if the target supports it directly.  */
> +      if (val && TREE_CODE (val) == SSA_NAME
> +       && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +       && (comparison == NE || comparison == EQ))
> +     {
> +       auto def_stmt = SSA_NAME_DEF_STMT (val);
> +       enum insn_code icode;
> +       if (is_gimple_assign (def_stmt)
> +           && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> +                == tcc_comparison)
> +         {
> +           class expand_operand ops[2];
> +           rtx_insn *tmp = NULL;
> +           start_sequence ();
> +           rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> +           rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> +           machine_mode mode2 = GET_MODE (op0c);
> +           create_input_operand (&ops[0], op0c, mode2);
> +           create_input_operand (&ops[1], op1c, mode2);
> +
> +           int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> +           auto inner_code = gimple_assign_rhs_code (def_stmt);
> +           rtx test2 = NULL_RTX;
> +
> +           enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2);
> +           if (unsignedp2)
> +             comparison2 = unsigned_condition (comparison2);
> +           if (comparison == NE)
> +             optab = vec_cbranch_any_optab;
> +           else
> +             optab = vec_cbranch_all_optab;
> +
> +           if ((icode = optab_handler (optab, mode2))
> +               != CODE_FOR_nothing
> +               && maybe_legitimize_operands (icode, 1, 2, ops))
> +             {
> +               prepare_cmp_insn (ops[0].value, ops[1].value, comparison2,
> +                                 size, unsignedp2, OPTAB_DIRECT, &test2,
> +                                 &mode2, optab);
> +               emit_cmp_and_jump_insn_1 (test2, mode2, label,
> +                                         optab, prob, false);
> +               tmp = get_insns ();
> +             }
> +
> +           end_sequence ();
> +           if (tmp)
> +             {
> +               emit_insn (tmp);
> +               return;
> +             }
> +         }
> +     }
>      }
>  
>    emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false);
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
>  OPTAB_D (umulhs_optab, "umulhs$a3")
>  OPTAB_D (umulhrs_optab, "umulhrs$a3")
>  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
> +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
>  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
>  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
>  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to