RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Tamar Christina Mon, 30 Jun 2025 00:26:42 -0700

ping


> -----Original Message-----
> From: Tamar Christina
> Sent: Monday, June 23, 2025 7:01 AM
> To: Tamar Christina <tamar.christ...@arm.com>; Richard Biener
> <rguent...@suse.de>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>;
> nd <n...@arm.com>
> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> vec_cbranch_all [PR118974]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christ...@arm.com>
> > Sent: Tuesday, June 10, 2025 4:19 PM
> > To: Richard Biener <rguent...@suse.de>
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>;
> > nd <n...@arm.com>
> > Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > vec_cbranch_all [PR118974]
> >
> > > >>>> We could
> > > >>>> have (any:CC (eq:V4BI (reg:V4SI) (reg:V4SI))) or so, or alternatively
> > > >>>> (more intrusive) have (any_eq:CC (reg:V4SI) (reg:V4SI))?  That
> > > >>>> would also allow you to have a proper RTL representation rather
> > > >>>> than more and more UNSEPCs.
> > > >>>
> > > >>> We do have proper RTL representation though..
> > > >>
> > > >> You have
> > > >>
> > > >>> (insn 31 30 32 5 (parallel [
> > > >>>            (set (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
> > > >>>                (unspec:VNx4BI [
> > > >>>                        (reg:VNx4BI 134)
> > > >>>                        (const_int 1 [0x1])
> > > >>>                        (gtu:VNx4BI (reg:VNx4QI 112 [ vect__1.16 ])
> > > >>>                            (reg:VNx4QI 128))
> > > >>>                    ] UNSPEC_PRED_Z))
> > > >>>            (clobber (reg:CC_NZC 66 cc))
> > > >>
> > > >> that's not
> > > >>
> > > >>  (set (reg:CC_NZC 66 cc) (any_gtu:CC_NZC (reg:VNx4QI 112 [ vect__1.16 
> > > >> ])
> > > >>                            ((reg:VNx4QI 128)))
> > > >>
> > > >> so nothing combine or simplify-rtx can work with (because of the 
> > > >> UNSPEC).
> > > >>
> > > >
> > > > Sure, I don’t think any_gtu provides any value here though. this 
> > > > information
> > > > "any" or "all" is a property of the use of the result not the 
> > > > comparison in RTL.
> > >
> > > But that was the gist of the question- so the compare does _not_ set a CC 
> > > that
> > you
> > > can directly branch on but there’s a use insn of <whatever> that either 
> > > all- or
> any-
> > > reduces <whatever> to a CC?  I’m now confused…
> > >
> >
> > Lets clear up the confusion.
> >
> > Yes 😊 For SVE, Integer vector compares set among others the Zero flag. And 
> > we
> > have
> > the branch instructions b.none (which checks Z==1) and b.any (which checks
> > Z==0).
> >
> > Floating point SVE vector compares do no not set any flags and requires the
> > additional
> > insn for the flags to be set.
> >
> > Regards,
> > Tamar
> >
> > > > So the "none" is here
> > > >
> > > > (jump_insn 35 34 36 5 (set (pc)
> > > >         (if_then_else (eq (reg:CC_NZC 66 cc)
> > > >                 (const_int 0 [0]))
> > > >             (label_ref 41)
> > > >             (pc))) "/app/example.cpp":29:7 -1
> > > >      (int_list:REG_BR_PROB 1014686028 (nil))
> > > >
> > > > So I don’t think that the ANY or ALL was ever a property of the compare,
> > > > But rather what you do with it.
> > > > Which is why today in gimple we have the != 0 or == 0 in the if no?
> > > >
> > > >> That said, I wanted to know whether SVE/NEON can do {any,all}_<compare>
> > > >> ops on vectors, setting the CC flag register you can then do a
> > > >> conditional jump on.  And that seems to be the case and what you
> > > >> are after.
> > > >>
> > > >
> > > > Yes, SVE can indeed for integer comparisons. Floating point needs the
> > > > ptest.
> > > >
> > > > Also other operations can do this too, for instance the MATCH 
> > > > instruction
> > > > being worked on now etc
> > > >
> > > >> But that's exactly what the cbranch optab is supposed to facilitate.
> > > >> What we're lacking is communication of the all vs. any.  Your choice
> > > >> is to do that by using different optabs, 'vec_cbranch_any' vs.
> > > >> 'vec_cbranch_all'.  I see how thats "easy", you just have to adjust
> > > >> RTL expansion.
> > > >>
> > > >
> > > > Indeed, and the benefit of this is that we can expand directly to the
> > > > optimal sequence, rather than relying on later passes to eliminate
> > > > the unneeded predicate operations.
> > > >
> > > >> On GIMPLE you could argue we have everything, we can support
> > > >>
> > > >> if (_1 ==/!= {-1, -1, ... })
> > > >>
> > > >> to match vec_cbranch_all/any for an inverted compare?
> > > >>
> > > >
> > > > Yes, so my patch at expand time just checks if _1 is a compare
> > > > And if so, reads the arguments of _1 and based on ==/!= it
> > > > chooses vec_cbranch_all/any and passes the argument of the
> > > > compare to the optab.   It can't handle {-1,-1,-1,...} today because
> > > > the expansion happens quite low during expand, so we can no
> > > > longer flip the branches to make them a normal != 0 compare.
> > > >
> > > > I hadn't worried about this since it would just fall back to cbranch
> > > > and we no longer generate this representation.
> > > >
> > > >> I see one of your issues is that the AND does not set flags, right?
> > > >> You do need the ptest once you have that?
> > > >>
> > > >
> > > > The AND is just because the middle-end can only generate an unpredicated
> > > > compare. And so we rely on combine to push the predicate inside 
> > > > eventually.
> > > >
> > > > So we don't need the AND itself to set flags, because for early exits 
> > > > all we care
> > > > About is CC_Z, and the AND can have an effect on N and C but not Z.
> > > >
> > > > So for pure SVE I can eliminate it (see patch 2/3) but for the case 
> > > > where we
> > have
> > > > to generate an SVE compare for Adv. SIMD (as it does the flag setting) 
> > > > means
> > > that
> > > > we have to match everything from the compare to the if_then_else.  You 
> > > > need
> > > > the compare since you have to replace it with the SVE variant, and you 
> > > > need
> the
> > > > if_then_else to realize if you have any or all.
> > > >
> > > > The optab allows us to avoid this by just emitting the SVE commit 
> > > > directly,
> and
> > > > completely avoid the AND by generating a predicated compare.
> > > >
> > > > So I am trying to fix some of the optimization in RTL, but for some 
> > > > cases
> > > expanding
> > > > correctly is the most reliable way.
> > > >
> > > > I would have though that the two optabs would be preferrable to 12 new
> > > comparison
> > > > operators, but could make those work too 😊
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > >>
> > > >> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> > > >> +@item @samp{cbranch_any@var{mode}4}
> > > >> +Conditional branch instruction combined with a compare instruction on
> > > >> vectors
> > > >> +where it is required that at least one of the elementwise comparisons 
> > > >> of
> > > >> the
> > > >> +two input vectors is true.
> > > >>
> > > >> That sentence is a bit odd.  Operand 0 is a comparison operator that
> > > >> is applied to all vector elements.  When at least one comparison 
> > > >> evaluates
> > > >> to true the branch is taken.
> > > >>
> > > >> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > >> +first and second operands of the comparison, respectively.  Operand 3
> > > >> +is the @code{code_label} to jump to.
> > > >>
> > > >> Tamar Christina
> > > >>
> > > >> AttachmentsJun 9, 2025, 8:03 AM (1 day ago)
> > > >>
> > > >> to gcc-patches, nd, rguenther
> > > >> This patch introduces two new vector cbranch optabs vec_cbranch_any and
> > > >> vec_cbranch_all.
> > > >>
> > > >> To explain why we need two new optabs let me explain the current 
> > > >> cbranch
> > > >> and its
> > > >> limitations and what I'm trying to optimize. So sorry for the long 
> > > >> email,
> > > >> but I
> > > >> hope it explains why I think we want new optabs.
> > > >>
> > > >> Today cbranch can be used for both vector and scalar modes.  In both 
> > > >> these
> > > >> cases it's intended to compare boolean values, either scalar or vector.
> > > >>
> > > >> The optab documentation does not however state that it can only handle
> > > >> comparisons against 0.  So many targets have added code for the vector
> > > >> variant
> > > >> that tries to deal with the case where we branch based on two non-zero
> > > >> registers.
> > > >>
> > > >> However this code can't ever be reached because the cbranch expansion
> only
> > > >> deals
> > > >> with comparisons against 0 for vectors.  This is because for vectors 
> > > >> the
> > > >> rest of
> > > >> the compiler has no way to generate a non-zero comparison. e.g. the
> > > >> vectorizer
> > > >> will always generate a zero comparison, and the C/C++ front-ends won't
> > > >> allow
> > > >> vectors to be used in a cbranch as it expects a boolean value.  ISAs 
> > > >> like
> > > >> SVE
> > > >> work around this by requiring you to use an SVE PTEST intrinsics which
> > > >> results
> > > >> in a single scalar boolean value that represents the flag values.
> > > >>
> > > >> e.g. if (svptest_any (..))
> > > >>
> > > >> The natural question is why do we not at expand time then rewrite the
> > > >> comparison
> > > >> to a non-zero comparison if the target supports it.
> > > >>
> > > >> The reason is we can't safely do so.  For an ANY comparison (e.g. != b)
> > > >> this is
> > > >> trivial, but for an ALL comparison (e.g. == b) we would have to flip 
> > > >> both
> > > >> branch
> > > >> and invert the value being compared.  i.e. we have to make it a != b
> > > >> comparison.
> > > >>
> > > >> But in emit_cmp_and_jump_insns we can't flip the branches anymore
> > because
> > > >> they
> > > >> have already been lowered into a fall through branch (PC) and a label,
> > > >> ready for
> > > >> use in an if_then_else RTL expression.
> > > >>
> > > >> Additionally as mentioned before, cbranches expect the values to be 
> > > >> masks,
> > > >> not
> > > >> values.  This kinda works out if you XOR the values, but for FP vectors
> > > >> you'd
> > > >> need to know what equality means for the FP format.  i.e. it's possible
> > > >> for
> > > >> IEEE 754 values but It's not immediately obvious if this is true for 
> > > >> all
> > > >> formats.
> > > >>
> > > >> Now why does any of this matter?  Well there are two optimizations we
> want
> > > >> to be
> > > >> able to do.
> > > >>
> > > >> 1. Adv. SIMD does not support a vector !=, as in there's no instruction
> > > >> for it.
> > > >>   For both Integer and FP vectors we perform the comparisons as EQ and
> > > >> then
> > > >>   invert the resulting mask.  Ideally we'd like to replace this with 
> > > >> just
> > > >> a XOR
> > > >>   and the appropriate branch.
> > > >>
> > > >> 2. When on an SVE enabled system we would like to use an SVE compare +
> > > >> branch
> > > >>   for the Adv. SIMD sequence which could happen due to cost modelling.
> > > >> However
> > > >>   we can only do so based on if we know that the values being compared
> > > >> against
> > > >>   are the boolean masks.  This means we can't really use combine to do
> > > >> this
> > > >>   because combine would have to match the entire sequence including the
> > > >>   vector comparisons because at RTL we've lost the information that
> > > >>   VECTOR_BOOLEAN_P would have given us.  This sequence would be too
> long
> > > >> for
> > > >>   combine to match due to it having to match the compare + branch
> > > >> sequence
> > > >>   being generated as well.  It also becomes a bit messy to match ANY 
> > > >> and
> > > >> ALL
> > > >>   sequences.
> > > >>
> > > >> To handle these two cases I propose the new optabs vec_cbranch_any and
> > > >> vec_cbranch_all that expect the operands to be values, not masks, and 
> > > >> the
> > > >> comparison operation is the comparison being performed.  The current
> > > >> cbranch
> > > >> optab can't be used here because we need to be able to see both
> comparison
> > > >> operators (for the boolean branch and the data compare branch).
> > > >>
> > > >> The initial != 0 or == 0 is folded away into the name as _any and _all
> > > >> allowing
> > > >> the target to easily do as it wishes.
> > > >>
> > > >> I have intentionally chosen to require cbranch_optab to still be the 
> > > >> main
> > > >> one.
> > > >> i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch
> > > >> because
> > > >> these two are an expand time only construct.  I've also chosen them to 
> > > >> be
> > > >> this
> > > >> such that there's no change needed to any other passes in the 
> > > >> middle-end.
> > > >>
> > > >> With these two optabs it's trivial to implement the two optimization I
> > > >> described
> > > >> above.  A target expansion hook is also possible but optabs felt more
> > > >> natural
> > > >> for the situation.
> > > >>
> > > >> I.e. with them we can now generate
> > > >>
> > > >> .L2:
> > > >>        ldr     q31, [x1, x2]
> > > >>        add     v29.4s, v29.4s, v25.4s
> > > >>        add     v28.4s, v28.4s, v26.4s
> > > >>        add     v31.4s, v31.4s, v30.4s
> > > >>        str     q31, [x1, x2]
> > > >>        add     x1, x1, 16
> > > >>        cmp     x1, 2560
> > > >>        beq     .L1
> > > >> .L6:
> > > >>        ldr     q30, [x3, x1]
> > > >>        cmpeq   p15.s, p7/z, z30.s, z27.s
> > > >>        b.none  .L2
> > > >>
> > > >> and easily prove it correct.
> > > >>
> > > >> Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > >> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > >> -m32, -m64 and no issues.
> > > >>
> > > >> Ok for master?
> > > >>
> > > >> Thanks,
> > > >> Tamar
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >>        PR target/118974
> > > >>        * optabs.cc (prepare_cmp_insn): Refactor to take optab to check
> > > >> for
> > > >>        instead of hardcoded cbranch.
> > > >>        (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if 
> > > >> supported.
> > > >>        * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): 
> > > >> New.
> > > >>        * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4):
> > > >> Document
> > > >>        them.
> > > >>
> > > >> ---
> > > >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > >> index
> > > >>
> > >
> >
> f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458
> > > >> b26f61904eaf5b24
> > > >> 100644
> > > >> --- a/gcc/doc/md.texi
> > > >> +++ b/gcc/doc/md.texi
> > > >> @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand
> 1
> > > and
> > > >> operand 2 are the
> > > >> first and second operands of the comparison, respectively.  Operand 3
> > > >> is the @code{code_label} to jump to.
> > > >>
> > > >> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> > > >> +@item @samp{cbranch_any@var{mode}4}
> > > >> +Conditional branch instruction combined with a compare instruction on
> > > >> vectors
> > > >> +where it is required that at least one of the elementwise comparisons 
> > > >> of
> > > >> the
> > > >> +two input vectors is true.
> > > >> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > >> +first and second operands of the comparison, respectively.  Operand 3
> > > >> +is the @code{code_label} to jump to.
> > > >> +
> > > >> +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > >> +@item @samp{cbranch_all@var{mode}4}
> > > >> +Conditional branch instruction combined with a compare instruction on
> > > >> vectors
> > > >> +where it is required that at all of the elementwise comparisons of the
> > > >> +two input vectors are true.
> > > >> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > >> +first and second operands of the comparison, respectively.  Operand 3
> > > >> +is the @code{code_label} to jump to.
> > > >> +
> > > >> @cindex @code{jump} instruction pattern
> > > >> @item @samp{jump}
> > > >> A jump inside a function; an unconditional branch.  Operand 0 is the
> > > >> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > >> index
> > > >>
> > >
> >
> 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d39126
> > > >> 5bc45fd454530cc67
> > > >> 100644
> > > >> --- a/gcc/optabs.cc
> > > >> +++ b/gcc/optabs.cc
> > > >> @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode
> > > vec_mode,
> > > >> machine_mode extr_mode)
> > > >>
> > > >>    *PMODE is the mode of the inputs (in case they are const_int).
> > > >>
> > > >> +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> > > >> +   cbranch_optab.
> > > >> +
> > > >>    This function performs all the setup necessary so that the caller 
> > > >> only
> > > >> has
> > > >>    to emit a single comparison insn.  This setup can involve doing a
> > > >> BLKmode
> > > >>    comparison or emitting a library call to perform the comparison if 
> > > >> no
> > > >> insn
> > > >> @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode
> > > vec_mode,
> > > >> machine_mode extr_mode)
> > > >> static void
> > > >> prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> > > >>                  int unsignedp, enum optab_methods methods,
> > > >> -                 rtx *ptest, machine_mode *pmode)
> > > >> +                 rtx *ptest, machine_mode *pmode, optab
> > > >> optab=cbranch_optab)
> > > >> {
> > > >>   machine_mode mode = *pmode;
> > > >>   rtx libfunc, test;
> > > >> @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >> comparison, rtx size,
> > > >>   FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
> > > >>     {
> > > >>       enum insn_code icode;
> > > >> -      icode = optab_handler (cbranch_optab, cmp_mode);
> > > >> +      icode = optab_handler (optab, cmp_mode);
> > > >>       if (icode != CODE_FOR_nothing
> > > >>          && insn_operand_matches (icode, 0, test))
> > > >>        {
> > > >> @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >> comparison, rtx size,
> > > >>       if (comparison == UNORDERED && rtx_equal_p (x, y))
> > > >>        {
> > > >>          prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, 
> > > >> OPTAB_WIDEN,
> > > >> -                           ptest, pmode);
> > > >> +                           ptest, pmode, optab);
> > > >>          if (*ptest)
> > > >>            return;
> > > >>        }
> > > >> @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >> comparison, rtx size,
> > > >>
> > > >>       *pmode = ret_mode;
> > > >>       prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> > > >> -                       ptest, pmode);
> > > >> +                       ptest, pmode, optab);
> > > >>     }
> > > >>
> > > >>   return;
> > > >> @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum
> > > >> rtx_code comparison, rtx size,
> > > >>      the target supports tbranch.  */
> > > >>   machine_mode tmode = mode;
> > > >>   direct_optab optab;
> > > >> -  if (op1 == CONST0_RTX (GET_MODE (op1))
> > > >> -      && validate_test_and_branch (val, &test, &tmode,
> > > >> -                                  &optab) != CODE_FOR_nothing)
> > > >> +  if (op1 == CONST0_RTX (GET_MODE (op1)))
> > > >>     {
> > > >> -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, 
> > > >> true);
> > > >> -      return;
> > > >> +      if (validate_test_and_branch (val, &test, &tmode,
> > > >> +                                   &optab) != CODE_FOR_nothing)
> > > >> +       {
> > > >> +         emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob,
> > > >> true);
> > > >> +         return;
> > > >> +       }
> > > >> +
> > > >> +      /* If we are comparing equality with 0, check if VAL is another
> > > >> equality
> > > >> +        comparison and if the target supports it directly.  */
> > > >> +      if (val && TREE_CODE (val) == SSA_NAME
> > > >> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> > > >> +         && (comparison == NE || comparison == EQ))
> > > >> +       {
> > > >> +         auto def_stmt = SSA_NAME_DEF_STMT (val);
> > > >>
> > > >> I think you may only look at the definition via get_def_for_expr (aka
> > > >> TER), and the result may be NULL.
> > > >>
> > > >> +             if ((icode = optab_handler (optab, mode2))
> > > >> +                 != CODE_FOR_nothing
> > > >>
> > > >> can we perform this check before expanding the operands please?
> > > >> Can we use vector_compare_rtx ().
> > > >>
> > > >> Somehow I feel this belongs in a separate function like
> > > >> validate_test_and_branch.
> > > >>
> > > >> Richard.
> > > >>
> > > >>> I think in the patch you've missed that the RTL from the expand is
> > overwritten.
> > > >>>
> > > >>> Basically the new optabs are doing essentially the same thing that you
> > > >>> are suggesting, but without needing the dummy second
> > comparison_operator.
> > > >>>
> > > >>> Notice how in the patch the comparison operator is the general
> > > >>>
> > > >>> aarch64_comparison_operator
> > > >>>
> > > >>> and not the stricter aarch64_equality_operator.
> > > >>>
> > > >>> Thanks,
> > > >>> Tamar
> > > >>>>
> > > >>>> Richard.
> > > >>>>
> > > >>>>> I.e. with them we can now generate
> > > >>>>>
> > > >>>>> .L2:
> > > >>>>>        ldr     q31, [x1, x2]
> > > >>>>>        add     v29.4s, v29.4s, v25.4s
> > > >>>>>        add     v28.4s, v28.4s, v26.4s
> > > >>>>>        add     v31.4s, v31.4s, v30.4s
> > > >>>>>        str     q31, [x1, x2]
> > > >>>>>        add     x1, x1, 16
> > > >>>>>        cmp     x1, 2560
> > > >>>>>        beq     .L1
> > > >>>>> .L6:
> > > >>>>>        ldr     q30, [x3, x1]
> > > >>>>>        cmpeq   p15.s, p7/z, z30.s, z27.s
> > > >>>>>        b.none  .L2
> > > >>>>>
> > > >>>>> and easily prove it correct.
> > > >>>>>
> > > >>>>> Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > >>>>> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > >>>>> -m32, -m64 and no issues.
> > > >>>>>
> > > >>>>> Ok for master?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Tamar
> > > >>>>>
> > > >>>>> gcc/ChangeLog:
> > > >>>>>
> > > >>>>>    PR target/118974
> > > >>>>>    * optabs.cc (prepare_cmp_insn): Refactor to take optab to check 
> > > >>>>> for
> > > >>>>>    instead of hardcoded cbranch.
> > > >>>>>    (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if 
> > > >>>>> supported.
> > > >>>>>    * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
> > > >>>>>    * doc/md.texi (cbranch_any@var{mode}4,
> cbranch_any@var{mode}4):
> > > >>>> Document
> > > >>>>>    them.
> > > >>>>>
> > > >>>>> ---
> > > >>>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > >>>>> index
> > > >>>>
> > > >>
> > >
> >
> f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458
> > > >>>> b26f61904eaf5b24 100644
> > > >>>>> --- a/gcc/doc/md.texi
> > > >>>>> +++ b/gcc/doc/md.texi
> > > >>>>> @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.
> > Operand
> > > 1
> > > >>>> and operand 2 are the
> > > >>>>> first and second operands of the comparison, respectively.  Operand 
> > > >>>>> 3
> > > >>>>> is the @code{code_label} to jump to.
> > > >>>>>
> > > >>>>> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> > > >>>>> +@item @samp{cbranch_any@var{mode}4}
> > > >>>>> +Conditional branch instruction combined with a compare instruction 
> > > >>>>> on
> > > >> vectors
> > > >>>>> +where it is required that at least one of the elementwise 
> > > >>>>> comparisons of
> > > the
> > > >>>>> +two input vectors is true.
> > > >>>>> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are 
> > > >>>>> the
> > > >>>>> +first and second operands of the comparison, respectively.  
> > > >>>>> Operand 3
> > > >>>>> +is the @code{code_label} to jump to.
> > > >>>>> +
> > > >>>>> +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > >>>>> +@item @samp{cbranch_all@var{mode}4}
> > > >>>>> +Conditional branch instruction combined with a compare instruction 
> > > >>>>> on
> > > >> vectors
> > > >>>>> +where it is required that at all of the elementwise comparisons of 
> > > >>>>> the
> > > >>>>> +two input vectors are true.
> > > >>>>> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are 
> > > >>>>> the
> > > >>>>> +first and second operands of the comparison, respectively.  
> > > >>>>> Operand 3
> > > >>>>> +is the @code{code_label} to jump to.
> > > >>>>> +
> > > >>>>> @cindex @code{jump} instruction pattern
> > > >>>>> @item @samp{jump}
> > > >>>>> A jump inside a function; an unconditional branch.  Operand 0 is the
> > > >>>>> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > >>>>> index
> > > >>>>
> > > >>
> > >
> >
> 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d39126
> > > >>>> 5bc45fd454530cc67 100644
> > > >>>>> --- a/gcc/optabs.cc
> > > >>>>> +++ b/gcc/optabs.cc
> > > >>>>> @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode
> > > >>>> vec_mode, machine_mode extr_mode)
> > > >>>>>
> > > >>>>>    *PMODE is the mode of the inputs (in case they are const_int).
> > > >>>>>
> > > >>>>> +   *OPTAB is the optab to check for OPTAB_DIRECT support.  
> > > >>>>> Defaults to
> > > >>>>> +   cbranch_optab.
> > > >>>>> +
> > > >>>>>    This function performs all the setup necessary so that the 
> > > >>>>> caller only has
> > > >>>>>    to emit a single comparison insn.  This setup can involve doing a
> > BLKmode
> > > >>>>>    comparison or emitting a library call to perform the comparison 
> > > >>>>> if no
> insn
> > > >>>>> @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode
> > > >>>> vec_mode, machine_mode extr_mode)
> > > >>>>> static void
> > > >>>>> prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> > > >>>>>          int unsignedp, enum optab_methods methods,
> > > >>>>> -          rtx *ptest, machine_mode *pmode)
> > > >>>>> +          rtx *ptest, machine_mode *pmode, optab
> > > >>>> optab=cbranch_optab)
> > > >>>>> {
> > > >>>>>   machine_mode mode = *pmode;
> > > >>>>>   rtx libfunc, test;
> > > >>>>> @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >>>> comparison, rtx size,
> > > >>>>>   FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
> > > >>>>>     {
> > > >>>>>       enum insn_code icode;
> > > >>>>> -      icode = optab_handler (cbranch_optab, cmp_mode);
> > > >>>>> +      icode = optab_handler (optab, cmp_mode);
> > > >>>>>       if (icode != CODE_FOR_nothing
> > > >>>>>      && insn_operand_matches (icode, 0, test))
> > > >>>>>    {
> > > >>>>> @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >>>> comparison, rtx size,
> > > >>>>>       if (comparison == UNORDERED && rtx_equal_p (x, y))
> > > >>>>>    {
> > > >>>>>      prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> > > >>>>> -                ptest, pmode);
> > > >>>>> +                ptest, pmode, optab);
> > > >>>>>      if (*ptest)
> > > >>>>>        return;
> > > >>>>>    }
> > > >>>>> @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > >>>> comparison, rtx size,
> > > >>>>>
> > > >>>>>       *pmode = ret_mode;
> > > >>>>>       prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp,
> methods,
> > > >>>>> -            ptest, pmode);
> > > >>>>> +            ptest, pmode, optab);
> > > >>>>>     }
> > > >>>>>
> > > >>>>>   return;
> > > >>>>> @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y,
> > enum
> > > >>>> rtx_code comparison, rtx size,
> > > >>>>>      the target supports tbranch.  */
> > > >>>>>   machine_mode tmode = mode;
> > > >>>>>   direct_optab optab;
> > > >>>>> -  if (op1 == CONST0_RTX (GET_MODE (op1))
> > > >>>>> -      && validate_test_and_branch (val, &test, &tmode,
> > > >>>>> -                   &optab) != CODE_FOR_nothing)
> > > >>>>> +  if (op1 == CONST0_RTX (GET_MODE (op1)))
> > > >>>>>     {
> > > >>>>> -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, 
> > > >>>>> true);
> > > >>>>> -      return;
> > > >>>>> +      if (validate_test_and_branch (val, &test, &tmode,
> > > >>>>> +                    &optab) != CODE_FOR_nothing)
> > > >>>>> +    {
> > > >>>>> +      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, 
> > > >>>>> true);
> > > >>>>> +      return;
> > > >>>>> +    }
> > > >>>>> +
> > > >>>>> +      /* If we are comparing equality with 0, check if VAL is 
> > > >>>>> another
> equality
> > > >>>>> +     comparison and if the target supports it directly.  */
> > > >>>>> +      if (val && TREE_CODE (val) == SSA_NAME
> > > >>>>> +      && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> > > >>>>> +      && (comparison == NE || comparison == EQ))
> > > >>>>> +    {
> > > >>>>> +      auto def_stmt = SSA_NAME_DEF_STMT (val);
> > > >>>>> +      enum insn_code icode;
> > > >>>>> +      if (is_gimple_assign (def_stmt)
> > > >>>>> +          && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> > > >>>>> +           == tcc_comparison)
> > > >>>>> +        {
> > > >>>>> +          class expand_operand ops[2];
> > > >>>>> +          rtx_insn *tmp = NULL;
> > > >>>>> +          start_sequence ();
> > > >>>>> +          rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> > > >>>>> +          rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> > > >>>>> +          machine_mode mode2 = GET_MODE (op0c);
> > > >>>>> +          create_input_operand (&ops[0], op0c, mode2);
> > > >>>>> +          create_input_operand (&ops[1], op1c, mode2);
> > > >>>>> +
> > > >>>>> +          int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> > > >>>>> +          auto inner_code = gimple_assign_rhs_code (def_stmt);
> > > >>>>> +          rtx test2 = NULL_RTX;
> > > >>>>> +
> > > >>>>> +          enum rtx_code comparison2 = get_rtx_code (inner_code,
> > > unsignedp2);
> > > >>>>> +          if (unsignedp2)
> > > >>>>> +        comparison2 = unsigned_condition (comparison2);
> > > >>>>> +          if (comparison == NE)
> > > >>>>> +        optab = vec_cbranch_any_optab;
> > > >>>>> +          else
> > > >>>>> +        optab = vec_cbranch_all_optab;
> > > >>>>> +
> > > >>>>> +          if ((icode = optab_handler (optab, mode2))
> > > >>>>> +          != CODE_FOR_nothing
> > > >>>>> +          && maybe_legitimize_operands (icode, 1, 2, ops))
> > > >>>>> +        {
> > > >>>>> +          prepare_cmp_insn (ops[0].value, ops[1].value, 
> > > >>>>> comparison2,
> > > >>>>> +                    size, unsignedp2, OPTAB_DIRECT, &test2,
> > > >>>>> +                    &mode2, optab);
> > > >>>>> +          emit_cmp_and_jump_insn_1 (test2, mode2, label,
> > > >>>>> +                        optab, prob, false);
> > > >>>>> +          tmp = get_insns ();
> > > >>>>> +        }
> > > >>>>> +
> > > >>>>> +          end_sequence ();
> > > >>>>> +          if (tmp)
> > > >>>>> +        {
> > > >>>>> +          emit_insn (tmp);
> > > >>>>> +          return;
> > > >>>>> +        }
> > > >>>>> +        }
> > > >>>>> +    }
> > > >>>>>     }
> > > >>>>>
> > > >>>>>   emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob,
> > > >> false);
> > > >>>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > >>>>> index
> > > >>>>
> > > >>
> > >
> >
> 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b
> > > >>>> 0be344e4856539bb 100644
> > > >>>>> --- a/gcc/optabs.def
> > > >>>>> +++ b/gcc/optabs.def
> > > >>>>> @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
> > > >>>>> OPTAB_D (umulhs_optab, "umulhs$a3")
> > > >>>>> OPTAB_D (umulhrs_optab, "umulhrs$a3")
> > > >>>>> OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> > > >>>>> +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
> > > >>>>> +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
> > > >>>>> OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> > > >>>>> OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
> > > >>>>> OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>> --
> > > >>>> Richard Biener <rguent...@suse.de>
> > > >>>> SUSE Software Solutions Germany GmbH,
> > > >>>> Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > >>>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > >> Nuernberg)
> > > >>>
> > > >>
> > > >> --
> > > >> Richard Biener <rguent...@suse.de>
> > > >> SUSE Software Solutions Germany GmbH,
> > > >> Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > Nuernberg)

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Reply via email to