[PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Tamar Christina Sun, 08 Jun 2025 23:00:24 -0700

This patch introduces two new vector cbranch optabs vec_cbranch_any and
vec_cbranch_all.


To explain why we need two new optabs let me explain the current cbranch and its
limitations and what I'm trying to optimize. So sorry for the long email, but I
hope it explains why I think we want new optabs.

Today cbranch can be used for both vector and scalar modes.  In both these
cases it's intended to compare boolean values, either scalar or vector.

The optab documentation does not however state that it can only handle
comparisons against 0.  So many targets have added code for the vector variant
that tries to deal with the case where we branch based on two non-zero
registers.

However this code can't ever be reached because the cbranch expansion only deals
with comparisons against 0 for vectors.  This is because for vectors the rest of
the compiler has no way to generate a non-zero comparison. e.g. the vectorizer
will always generate a zero comparison, and the C/C++ front-ends won't allow
vectors to be used in a cbranch as it expects a boolean value.  ISAs like SVE
work around this by requiring you to use an SVE PTEST intrinsics which results
in a single scalar boolean value that represents the flag values.

e.g. if (svptest_any (..))

The natural question is why do we not at expand time then rewrite the comparison
to a non-zero comparison if the target supports it.

The reason is we can't safely do so.  For an ANY comparison (e.g. != b) this is
trivial, but for an ALL comparison (e.g. == b) we would have to flip both branch
and invert the value being compared.  i.e. we have to make it a != b comparison.

But in emit_cmp_and_jump_insns we can't flip the branches anymore because they
have already been lowered into a fall through branch (PC) and a label, ready for
use in an if_then_else RTL expression.

Additionally as mentioned before, cbranches expect the values to be masks, not
values.  This kinda works out if you XOR the values, but for FP vectors you'd
need to know what equality means for the FP format.  i.e. it's possible for
IEEE 754 values but It's not immediately obvious if this is true for all
formats.

Now why does any of this matter?  Well there are two optimizations we want to be
able to do.

1. Adv. SIMD does not support a vector !=, as in there's no instruction for it.
   For both Integer and FP vectors we perform the comparisons as EQ and then
   invert the resulting mask.  Ideally we'd like to replace this with just a XOR
   and the appropriate branch.

2. When on an SVE enabled system we would like to use an SVE compare + branch
   for the Adv. SIMD sequence which could happen due to cost modelling.  However
   we can only do so based on if we know that the values being compared against
   are the boolean masks.  This means we can't really use combine to do this
   because combine would have to match the entire sequence including the
   vector comparisons because at RTL we've lost the information that
   VECTOR_BOOLEAN_P would have given us.  This sequence would be too long for
   combine to match due to it having to match the compare + branch sequence
   being generated as well.  It also becomes a bit messy to match ANY and ALL
   sequences.

To handle these two cases I propose the new optabs vec_cbranch_any and
vec_cbranch_all that expect the operands to be values, not masks, and the
comparison operation is the comparison being performed.  The current cbranch
optab can't be used here because we need to be able to see both comparison
operators (for the boolean branch and the data compare branch).

The initial != 0 or == 0 is folded away into the name as _any and _all allowing
the target to easily do as it wishes.

I have intentionally chosen to require cbranch_optab to still be the main one.
i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch because
these two are an expand time only construct.  I've also chosen them to be this
such that there's no change needed to any other passes in the middle-end.

With these two optabs it's trivial to implement the two optimization I described
above.  A target expansion hook is also possible but optabs felt more natural
for the situation.

I.e. with them we can now generate

.L2:
        ldr     q31, [x1, x2]
        add     v29.4s, v29.4s, v25.4s
        add     v28.4s, v28.4s, v26.4s
        add     v31.4s, v31.4s, v30.4s
        str     q31, [x1, x2]
        add     x1, x1, 16
        cmp     x1, 2560
        beq     .L1
.L6:
        ldr     q30, [x3, x1]
        cmpeq   p15.s, p7/z, z30.s, z27.s
        b.none  .L2

and easily prove it correct.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        PR target/118974
        * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
        instead of hardcoded cbranch.
        (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported.
        * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
        * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4): Document
        them.

---
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1 and 
operand 2 are the
 first and second operands of the comparison, respectively.  Operand 3
 is the @code{code_label} to jump to.
 
+@cindex @code{cbranch_any@var{mode}4} instruction pattern
+@item @samp{cbranch_any@var{mode}4}
+Conditional branch instruction combined with a compare instruction on vectors
+where it is required that at least one of the elementwise comparisons of the
+two input vectors is true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
+@cindex @code{cbranch_all@var{mode}4} instruction pattern
+@item @samp{cbranch_all@var{mode}4}
+Conditional branch instruction combined with a compare instruction on vectors
+where it is required that at all of the elementwise comparisons of the
+two input vectors are true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
 @cindex @code{jump} instruction pattern
 @item @samp{jump}
 A jump inside a function; an unconditional branch.  Operand 0 is the
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 
0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67
 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
machine_mode extr_mode)
 
    *PMODE is the mode of the inputs (in case they are const_int).
 
+   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
+   cbranch_optab.
+
    This function performs all the setup necessary so that the caller only has
    to emit a single comparison insn.  This setup can involve doing a BLKmode
    comparison or emitting a library call to perform the comparison if no insn
@@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
machine_mode extr_mode)
 static void
 prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
                  int unsignedp, enum optab_methods methods,
-                 rtx *ptest, machine_mode *pmode)
+                 rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab)
 {
   machine_mode mode = *pmode;
   rtx libfunc, test;
@@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, 
rtx size,
   FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
     {
       enum insn_code icode;
-      icode = optab_handler (cbranch_optab, cmp_mode);
+      icode = optab_handler (optab, cmp_mode);
       if (icode != CODE_FOR_nothing
          && insn_operand_matches (icode, 0, test))
        {
@@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, 
rtx size,
       if (comparison == UNORDERED && rtx_equal_p (x, y))
        {
          prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
-                           ptest, pmode);
+                           ptest, pmode, optab);
          if (*ptest)
            return;
        }
@@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, 
rtx size,
 
       *pmode = ret_mode;
       prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
-                       ptest, pmode);
+                       ptest, pmode, optab);
     }
 
   return;
@@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code 
comparison, rtx size,
      the target supports tbranch.  */
   machine_mode tmode = mode;
   direct_optab optab;
-  if (op1 == CONST0_RTX (GET_MODE (op1))
-      && validate_test_and_branch (val, &test, &tmode,
-                                  &optab) != CODE_FOR_nothing)
+  if (op1 == CONST0_RTX (GET_MODE (op1)))
     {
-      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
-      return;
+      if (validate_test_and_branch (val, &test, &tmode,
+                                   &optab) != CODE_FOR_nothing)
+       {
+         emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
+         return;
+       }
+
+      /* If we are comparing equality with 0, check if VAL is another equality
+        comparison and if the target supports it directly.  */
+      if (val && TREE_CODE (val) == SSA_NAME
+         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+         && (comparison == NE || comparison == EQ))
+       {
+         auto def_stmt = SSA_NAME_DEF_STMT (val);
+         enum insn_code icode;
+         if (is_gimple_assign (def_stmt)
+             && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
+                  == tcc_comparison)
+           {
+             class expand_operand ops[2];
+             rtx_insn *tmp = NULL;
+             start_sequence ();
+             rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
+             rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
+             machine_mode mode2 = GET_MODE (op0c);
+             create_input_operand (&ops[0], op0c, mode2);
+             create_input_operand (&ops[1], op1c, mode2);
+
+             int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
+             auto inner_code = gimple_assign_rhs_code (def_stmt);
+             rtx test2 = NULL_RTX;
+
+             enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2);
+             if (unsignedp2)
+               comparison2 = unsigned_condition (comparison2);
+             if (comparison == NE)
+               optab = vec_cbranch_any_optab;
+             else
+               optab = vec_cbranch_all_optab;
+
+             if ((icode = optab_handler (optab, mode2))
+                 != CODE_FOR_nothing
+                 && maybe_legitimize_operands (icode, 1, 2, ops))
+               {
+                 prepare_cmp_insn (ops[0].value, ops[1].value, comparison2,
+                                   size, unsignedp2, OPTAB_DIRECT, &test2,
+                                   &mode2, optab);
+                 emit_cmp_and_jump_insn_1 (test2, mode2, label,
+                                           optab, prob, false);
+                 tmp = get_insns ();
+               }
+
+             end_sequence ();
+             if (tmp)
+               {
+                 emit_insn (tmp);
+                 return;
+               }
+           }
+       }
     }
 
   emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
 OPTAB_D (umulhs_optab, "umulhs$a3")
 OPTAB_D (umulhrs_optab, "umulhrs$a3")
 OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
+OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
+OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
 OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")


--

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
 first and second operands of the comparison, respectively.  Operand 3
 is the @code{code_label} to jump to.
 
+@cindex @code{cbranch_any@var{mode}4} instruction pattern
+@item @samp{cbranch_any@var{mode}4}
+Conditional branch instruction combined with a compare instruction on vectors
+where it is required that at least one of the elementwise comparisons of the
+two input vectors is true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
+@cindex @code{cbranch_all@var{mode}4} instruction pattern
+@item @samp{cbranch_all@var{mode}4}
+Conditional branch instruction combined with a compare instruction on vectors
+where it is required that at all of the elementwise comparisons of the
+two input vectors are true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
 @cindex @code{jump} instruction pattern
 @item @samp{jump}
 A jump inside a function; an unconditional branch.  Operand 0 is the
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode)
 
    *PMODE is the mode of the inputs (in case they are const_int).
 
+   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
+   cbranch_optab.
+
    This function performs all the setup necessary so that the caller only has
    to emit a single comparison insn.  This setup can involve doing a BLKmode
    comparison or emitting a library call to perform the comparison if no insn
@@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, machine_mode extr_mode)
 static void
 prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
 		  int unsignedp, enum optab_methods methods,
-		  rtx *ptest, machine_mode *pmode)
+		  rtx *ptest, machine_mode *pmode, optab optab=cbranch_optab)
 {
   machine_mode mode = *pmode;
   rtx libfunc, test;
@@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
   FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
     {
       enum insn_code icode;
-      icode = optab_handler (cbranch_optab, cmp_mode);
+      icode = optab_handler (optab, cmp_mode);
       if (icode != CODE_FOR_nothing
 	  && insn_operand_matches (icode, 0, test))
 	{
@@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
       if (comparison == UNORDERED && rtx_equal_p (x, y))
 	{
 	  prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
-			    ptest, pmode);
+			    ptest, pmode, optab);
 	  if (*ptest)
 	    return;
 	}
@@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
 
       *pmode = ret_mode;
       prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
-			ptest, pmode);
+			ptest, pmode, optab);
     }
 
   return;
@@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum rtx_code comparison, rtx size,
      the target supports tbranch.  */
   machine_mode tmode = mode;
   direct_optab optab;
-  if (op1 == CONST0_RTX (GET_MODE (op1))
-      && validate_test_and_branch (val, &test, &tmode,
-				   &optab) != CODE_FOR_nothing)
+  if (op1 == CONST0_RTX (GET_MODE (op1)))
     {
-      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
-      return;
+      if (validate_test_and_branch (val, &test, &tmode,
+				    &optab) != CODE_FOR_nothing)
+	{
+	  emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
+	  return;
+	}
+
+      /* If we are comparing equality with 0, check if VAL is another equality
+	 comparison and if the target supports it directly.  */
+      if (val && TREE_CODE (val) == SSA_NAME
+	  && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+	  && (comparison == NE || comparison == EQ))
+	{
+	  auto def_stmt = SSA_NAME_DEF_STMT (val);
+	  enum insn_code icode;
+	  if (is_gimple_assign (def_stmt)
+	      && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
+		   == tcc_comparison)
+	    {
+	      class expand_operand ops[2];
+	      rtx_insn *tmp = NULL;
+	      start_sequence ();
+	      rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
+	      rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
+	      machine_mode mode2 = GET_MODE (op0c);
+	      create_input_operand (&ops[0], op0c, mode2);
+	      create_input_operand (&ops[1], op1c, mode2);
+
+	      int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
+	      auto inner_code = gimple_assign_rhs_code (def_stmt);
+	      rtx test2 = NULL_RTX;
+
+	      enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2);
+	      if (unsignedp2)
+		comparison2 = unsigned_condition (comparison2);
+	      if (comparison == NE)
+		optab = vec_cbranch_any_optab;
+	      else
+		optab = vec_cbranch_all_optab;
+
+	      if ((icode = optab_handler (optab, mode2))
+		  != CODE_FOR_nothing
+		  && maybe_legitimize_operands (icode, 1, 2, ops))
+		{
+		  prepare_cmp_insn (ops[0].value, ops[1].value, comparison2,
+				    size, unsignedp2, OPTAB_DIRECT, &test2,
+				    &mode2, optab);
+		  emit_cmp_and_jump_insn_1 (test2, mode2, label,
+					    optab, prob, false);
+		  tmp = get_insns ();
+		}
+
+	      end_sequence ();
+	      if (tmp)
+		{
+		  emit_insn (tmp);
+		  return;
+		}
+	    }
+	}
     }
 
   emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, false);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b0be344e4856539bb 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
 OPTAB_D (umulhs_optab, "umulhs$a3")
 OPTAB_D (umulhrs_optab, "umulhrs$a3")
 OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
+OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
+OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
 OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
 OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")

[PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Reply via email to