RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Richard Biener Tue, 10 Jun 2025 06:12:50 -0700

On Mon, 9 Jun 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Monday, June 9, 2025 10:30 AM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>;
> > nd <n...@arm.com>
> > Subject: Re: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > vec_cbranch_all [PR118974]
> > 
> > On Mon, 9 Jun 2025, Tamar Christina wrote:
> > 
> > > This patch introduces two new vector cbranch optabs vec_cbranch_any and
> > > vec_cbranch_all.
> > >
> > > To explain why we need two new optabs let me explain the current cbranch 
> > > and
> > its
> > > limitations and what I'm trying to optimize. So sorry for the long email, 
> > > but I
> > > hope it explains why I think we want new optabs.
> > >
> > > Today cbranch can be used for both vector and scalar modes.  In both these
> > > cases it's intended to compare boolean values, either scalar or vector.
> > >
> > > The optab documentation does not however state that it can only handle
> > > comparisons against 0.  So many targets have added code for the vector 
> > > variant
> > > that tries to deal with the case where we branch based on two non-zero
> > > registers.
> > >
> > > However this code can't ever be reached because the cbranch expansion only
> > deals
> > > with comparisons against 0 for vectors.  This is because for vectors the 
> > > rest of
> > > the compiler has no way to generate a non-zero comparison. e.g. the 
> > > vectorizer
> > > will always generate a zero comparison, and the C/C++ front-ends won't 
> > > allow
> > > vectors to be used in a cbranch as it expects a boolean value.  ISAs like 
> > > SVE
> > > work around this by requiring you to use an SVE PTEST intrinsics which 
> > > results
> > > in a single scalar boolean value that represents the flag values.
> > >
> > > e.g. if (svptest_any (..))
> > >
> > > The natural question is why do we not at expand time then rewrite the
> > comparison
> > > to a non-zero comparison if the target supports it.
> > >
> > > The reason is we can't safely do so.  For an ANY comparison (e.g. != b) 
> > > this is
> > > trivial, but for an ALL comparison (e.g. == b) we would have to flip both 
> > > branch
> > > and invert the value being compared.  i.e. we have to make it a != b 
> > > comparison.
> > >
> > > But in emit_cmp_and_jump_insns we can't flip the branches anymore because
> > they
> > > have already been lowered into a fall through branch (PC) and a label, 
> > > ready for
> > > use in an if_then_else RTL expression.
> > >
> > > Additionally as mentioned before, cbranches expect the values to be 
> > > masks, not
> > > values.  This kinda works out if you XOR the values, but for FP vectors 
> > > you'd
> > > need to know what equality means for the FP format.  i.e. it's possible 
> > > for
> > > IEEE 754 values but It's not immediately obvious if this is true for all
> > > formats.
> > >
> > > Now why does any of this matter?  Well there are two optimizations we 
> > > want to
> > be
> > > able to do.
> > >
> > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction 
> > > for it.
> > >    For both Integer and FP vectors we perform the comparisons as EQ and 
> > > then
> > >    invert the resulting mask.  Ideally we'd like to replace this with 
> > > just a XOR
> > >    and the appropriate branch.
> > >
> > > 2. When on an SVE enabled system we would like to use an SVE compare +
> > branch
> > >    for the Adv. SIMD sequence which could happen due to cost modelling.
> > However
> > >    we can only do so based on if we know that the values being compared 
> > > against
> > >    are the boolean masks.  This means we can't really use combine to do 
> > > this
> > >    because combine would have to match the entire sequence including the
> > >    vector comparisons because at RTL we've lost the information that
> > >    VECTOR_BOOLEAN_P would have given us.  This sequence would be too long
> > for
> > >    combine to match due to it having to match the compare + branch 
> > > sequence
> > >    being generated as well.  It also becomes a bit messy to match ANY and 
> > > ALL
> > >    sequences.
> > 
> > I guess I didn't really understand the above, esp. why all of this is
> > not recoverable on RTL, so I'll state what I think
> > cbranch is supposed to do and then ask some question on the proposal.
> > 
> 
> Quite simply the log chain is too long. SVE has flag setting comparisons, so 
> in a
> single instruction we can perform the final comparison. (which doesn't need to
> be restricted to EQ or NE) and then using the appropriate condition code 
> decide
> if we have an ANY or ALL comparison.  So the sequence to match is very long.
> 
> That is to say, in gimple we have
> 
> a = b > c
> if (a != 0)
>   ...
> and SVE doesn't need the indirection.  Conceptually this makes it
> 
> if (any (b > c))
>  ...
> 
> For SVE the cbranch expands to a predicate test, which also sets the flag 
> again.
> this duplication in flag setting is later then removed by various patterns in
> the backend.  This is all fine, but where it gets difficult is trying to 
> replace the
> Adv. SIMD Compare + branch with an SVE one.  To do this it would have to match
> the entirety of 
> 
> a = b > c
> if (a != 0)
> 
> which in RTL is
> 
> (insn 66 65 67 4 (set (reg:V16QI 185 [ mask_patt_9.20_138 ])
>         (neg:V16QI (gtu:V16QI (reg:V16QI 183)
>                 (reg:V16QI 184)))) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 67 66 68 4 (set (reg:V4SI 186)
>         (unspec:V4SI [
>                 (subreg:V4SI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0)
>                 (subreg:V4SI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0)
>             ] UNSPEC_UMAXV)) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 68 67 69 4 (set (reg:V16QI 185 [ mask_patt_9.20_138 ])
>         (subreg:V16QI (reg:V4SI 186) 0)) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 69 68 70 4 (set (reg:DI 187)
>         (subreg:DI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0)) 
> "/app/example.cpp":29:7 -1
>      (nil))
> (insn 70 69 71 4 (set (reg:CC 66 cc)
>         (compare:CC (reg:DI 187)
>             (const_int 0 [0]))) "/app/example.cpp":29:7 -1
>      (nil))
> (jump_insn 71 70 72 4 (set (pc)
>         (if_then_else (ne (reg:CC 66 cc)
>                 (const_int 0 [0]))
>             (label_ref:DI 397)
>             (pc))) "/app/example.cpp":29:7 19 {condjump}
> 
> And any intermediate "submatch" isn't really valid. The sequence would need 
> to be matched
> as a whole or not at all.
> 
> The second problem is the lack of Adv. SIMD is having a vector != comparison.
> 
> This means there's an additional "NOT" in inverting the mask.   This 
> operation can be optimized
> in three ways.
> 
> 1.  it can be converted from a MAX reduction into a MIN reduction and then 
> the NOT can be removed.
>      The branch would have to be inverted.
> 2.  The data vectors can be XOR'ed together, and inequality means a non-zero 
> bit anywhere in the result.
> 3.  Use an SVE comparison, which does have != on vectors.
> 
> This chain is again quite long to match, and we can't incrementally match it 
> since it's only value if we know
> we're using the results in a zero or non-zero comparison. 
> 
> > cbranch is supposed to branch on the result of a comparison,
> > being it on vector or scalar operands, the result of the comparison is
> > a scalar, or rather in full generality, a (scalar) condition-code.
> > Given it uses a COMPARE RTX as comparison operator we can currently
> > only handle whole vector equality/inequality which map to all/any.
> 
> Correct, and normally this is fine, but for SVE we have support for what's
> essentially "vector" condition code.  This is why I also named the new optabs
> vec_cbranch.
> 
> The biggest difference is that instead of requiring the reduction to a scalar
> It allows the branching on vector elements comparisons *directly*.  Which 
> makes
> it much easier to do the two fixes above.
> 
> > 
> > > To handle these two cases I propose the new optabs vec_cbranch_any and
> > > vec_cbranch_all that expect the operands to be values, not masks, and the
> > > comparison operation is the comparison being performed.  The current 
> > > cbranch
> > > optab can't be used here because we need to be able to see both comparison
> > > operators (for the boolean branch and the data compare branch).
> > >
> > > The initial != 0 or == 0 is folded away into the name as _any and _all 
> > > allowing
> > > the target to easily do as it wishes.
> > >
> > > I have intentionally chosen to require cbranch_optab to still be the main 
> > > one.
> > > i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch 
> > > because
> > > these two are an expand time only construct.  I've also chosen them to be 
> > > this
> > > such that there's no change needed to any other passes in the middle-end.
> > >
> > > With these two optabs it's trivial to implement the two optimization I 
> > > described
> > > above.  A target expansion hook is also possible but optabs felt more 
> > > natural
> > > for the situation.
> > 
> > What's the RTL the optabs expand to for SVE/AdvSIMD?  How do you
> > represent the any/all vector compare difference?  Looking at 3/3 I see
> > you use UNSPECs.
> > 
> 
> You mean in the expand RTL? That's just so I can make the iterators.  The 
> expansion is:
> 
> (insn 31 30 32 5 (parallel [
>             (set (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
>                 (unspec:VNx4BI [
>                         (reg:VNx4BI 134)
>                         (const_int 1 [0x1])
>                         (gtu:VNx4BI (reg:VNx4QI 112 [ vect__1.16 ])
>                             (reg:VNx4QI 128))
>                     ] UNSPEC_PRED_Z))
>             (clobber (reg:CC_NZC 66 cc))
>         ]) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 32 31 33 5 (set (reg:VNx4BI 136 [ vec_mask_and_42 ])
>         (and:VNx4BI (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
>             (reg:VNx4BI 111 [ loop_mask_36 ]))) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 33 32 34 5 (set (reg:VNx16BI 137)
>         (const_vector:VNx16BI repeat [
>                 (const_int 1 [0x1])
>                 (const_int 0 [0]) repeated x3
>             ])) "/app/example.cpp":29:7 -1
>      (nil))
> (insn 34 33 35 5 (set (reg:CC_NZC 66 cc)
>         (unspec:CC_NZC [
>                 (reg:VNx16BI 137)
>                 (subreg:VNx4BI (reg:VNx16BI 137) 0)
>                 (const_int 1 [0x1])
>                 (reg:VNx4BI 136 [ vec_mask_and_42 ])
>             ] UNSPEC_PTEST)) "/app/example.cpp":29:7 -1
>      (nil))
> (jump_insn 35 34 36 5 (set (pc)
>         (if_then_else (eq (reg:CC_NZC 66 cc)
>                 (const_int 0 [0]))
>             (label_ref 41)
>             (pc))) "/app/example.cpp":29:7 -1
>      (int_list:REG_BR_PROB 1014686028 (nil))
>  -> 41)
> 
> Notice how insn 31 which is the vector compare sets CC_NZC.


That's a "scalar" condition code?

> Insn 32 is the masking of the compare, it'll be moved into insn 31 by
> combine replacing the all true predicate.

So the (reg:VNx4BI 133) is essentially unused, but it's required to ...

> Insn 34 is a flag reshaping operation.  In case the predicates don't
> have the same element width the CC reg values needs to be
> reshaped (in case you're doing an eg. b.first (branch if first element 
> matches) etc).

... do this, which produces the actual CC_NZC we jump on.

> However because insn 35 only cares about 0 or !0 the shape doesn't matter.
> So insn 34 will be eliminated by combine.
> 
> This leaves insn and 31 and 35.  Essentially only the compare + branch,
> directly on the vector elements, which is what vec_cbranch is now
> representing.
> 
> The optab allows us to generate the paradoxical subregs to convert
> Adv. SIMD registers into SVE ones, and the predicate mask to make
> the operation safe.
> 
> Without this, we have to rely on combine which (even if it could match
> such long sequences) would have to not just replace the adv. simd + b
> with an SVE + b, but also do all the predicate optimization in the one
> single pattern.
> 
> > What kind of compares do you expect us to generate?  Any besides
> > equality compares?
> 
> Any compare is possible. Since SVE vector comparisons take all 6
> comparison operators, there's also floating point versions and
> unordered variants.  Basically in the ISA vectors are first class
> control flow elements.
> 
> > Why would we not allow cbranch to take an
> > operand 0 that specifies the desired reduction?
> 
> That's certainly possible, but that's just going to make cbranch itself
> more complicated to understand.  I chose to go with _any and _all
> since as you say, for the branch comparison itself only != 0 and == 0
> makes sense.  So cbranch would need to get two operators in this case
> one for the branch comparison and one for the data comparison.
> 
> i.e. if expanding
> 
> a = b > c
> if (a != 0)
> 
> branch would get
>   (set (reg a)
>     (gt (reg b) (reg c))
>   (clobber cc_nz)
>   (ne (reg a) 0)
>   (if_then_else ...)
> 
> I basically opted for a simpler change, since you can only have != 0 and == 0
> this is just encoded in the name "any" and "all" so we don't have to send the
> dummy operators along.

Well, if we'd introduce ANY_LT_EXPR, ANY_NE_EXPR, ... ALL_LT_EXPR, 
ALL_NE_EXPR ... then we can have on GIMPLE

 if (a ANY_GT c)

and no need for a separate compare and compare-and-branch.

> > We could
> > have (any:CC (eq:V4BI (reg:V4SI) (reg:V4SI))) or so, or alternatively
> > (more intrusive) have (any_eq:CC (reg:V4SI) (reg:V4SI))?  That
> > would also allow you to have a proper RTL representation rather
> > than more and more UNSEPCs.
> 
> We do have proper RTL representation though..

You have

> (insn 31 30 32 5 (parallel [
>             (set (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
>                 (unspec:VNx4BI [
>                         (reg:VNx4BI 134)
>                         (const_int 1 [0x1])
>                         (gtu:VNx4BI (reg:VNx4QI 112 [ vect__1.16 ])
>                             (reg:VNx4QI 128))
>                     ] UNSPEC_PRED_Z))
>             (clobber (reg:CC_NZC 66 cc))

that's not

  (set (reg:CC_NZC 66 cc) (any_gtu:CC_NZC (reg:VNx4QI 112 [ vect__1.16 ]) 
                            ((reg:VNx4QI 128)))

so nothing combine or simplify-rtx can work with (because of the UNSPEC).

That said, I wanted to know whether SVE/NEON can do {any,all}_<compare>
ops on vectors, setting the CC flag register you can then do a
conditional jump on.  And that seems to be the case and what you
are after.

But that's exactly what the cbranch optab is supposed to facilitate.
What we're lacking is communication of the all vs. any.  Your choice
is to do that by using different optabs, 'vec_cbranch_any' vs.
'vec_cbranch_all'.  I see how thats "easy", you just have to adjust
RTL expansion.

On GIMPLE you could argue we have everything, we can support

 if (_1 ==/!= {-1, -1, ... })

to match vec_cbranch_all/any for an inverted compare?

I see one of your issues is that the AND does not set flags, right?
You do need the ptest once you have that?


+@cindex @code{cbranch_any@var{mode}4} instruction pattern
+@item @samp{cbranch_any@var{mode}4}
+Conditional branch instruction combined with a compare instruction on 
vectors
+where it is required that at least one of the elementwise comparisons of 
the
+two input vectors is true.

That sentence is a bit odd.  Operand 0 is a comparison operator that
is applied to all vector elements.  When at least one comparison evaluates
to true the branch is taken.

+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.

Tamar Christina
        
AttachmentsJun 9, 2025, 8:03 AM (1 day ago)
        
to gcc-patches, nd, rguenther
This patch introduces two new vector cbranch optabs vec_cbranch_any and
vec_cbranch_all.

To explain why we need two new optabs let me explain the current cbranch 
and its
limitations and what I'm trying to optimize. So sorry for the long email, 
but I
hope it explains why I think we want new optabs.

Today cbranch can be used for both vector and scalar modes.  In both these
cases it's intended to compare boolean values, either scalar or vector.

The optab documentation does not however state that it can only handle
comparisons against 0.  So many targets have added code for the vector 
variant
that tries to deal with the case where we branch based on two non-zero
registers.

However this code can't ever be reached because the cbranch expansion only 
deals
with comparisons against 0 for vectors.  This is because for vectors the 
rest of
the compiler has no way to generate a non-zero comparison. e.g. the 
vectorizer
will always generate a zero comparison, and the C/C++ front-ends won't 
allow
vectors to be used in a cbranch as it expects a boolean value.  ISAs like 
SVE
work around this by requiring you to use an SVE PTEST intrinsics which 
results
in a single scalar boolean value that represents the flag values.

e.g. if (svptest_any (..))

The natural question is why do we not at expand time then rewrite the 
comparison
to a non-zero comparison if the target supports it.

The reason is we can't safely do so.  For an ANY comparison (e.g. != b) 
this is
trivial, but for an ALL comparison (e.g. == b) we would have to flip both 
branch
and invert the value being compared.  i.e. we have to make it a != b 
comparison.

But in emit_cmp_and_jump_insns we can't flip the branches anymore because 
they
have already been lowered into a fall through branch (PC) and a label, 
ready for
use in an if_then_else RTL expression.

Additionally as mentioned before, cbranches expect the values to be masks, 
not
values.  This kinda works out if you XOR the values, but for FP vectors 
you'd
need to know what equality means for the FP format.  i.e. it's possible 
for
IEEE 754 values but It's not immediately obvious if this is true for all
formats.

Now why does any of this matter?  Well there are two optimizations we want 
to be
able to do.

1. Adv. SIMD does not support a vector !=, as in there's no instruction 
for it.
   For both Integer and FP vectors we perform the comparisons as EQ and 
then
   invert the resulting mask.  Ideally we'd like to replace this with just 
a XOR
   and the appropriate branch.

2. When on an SVE enabled system we would like to use an SVE compare + 
branch
   for the Adv. SIMD sequence which could happen due to cost modelling.  
However
   we can only do so based on if we know that the values being compared 
against
   are the boolean masks.  This means we can't really use combine to do 
this
   because combine would have to match the entire sequence including the
   vector comparisons because at RTL we've lost the information that
   VECTOR_BOOLEAN_P would have given us.  This sequence would be too long 
for
   combine to match due to it having to match the compare + branch 
sequence
   being generated as well.  It also becomes a bit messy to match ANY and 
ALL
   sequences.

To handle these two cases I propose the new optabs vec_cbranch_any and
vec_cbranch_all that expect the operands to be values, not masks, and the
comparison operation is the comparison being performed.  The current 
cbranch
optab can't be used here because we need to be able to see both comparison
operators (for the boolean branch and the data compare branch).

The initial != 0 or == 0 is folded away into the name as _any and _all 
allowing
the target to easily do as it wishes.

I have intentionally chosen to require cbranch_optab to still be the main 
one.
i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch 
because
these two are an expand time only construct.  I've also chosen them to be 
this
such that there's no change needed to any other passes in the middle-end.

With these two optabs it's trivial to implement the two optimization I 
described
above.  A target expansion hook is also possible but optabs felt more 
natural
for the situation.

I.e. with them we can now generate

.L2:
        ldr     q31, [x1, x2]
        add     v29.4s, v29.4s, v25.4s
        add     v28.4s, v28.4s, v26.4s
        add     v31.4s, v31.4s, v30.4s
        str     q31, [x1, x2]
        add     x1, x1, 16
        cmp     x1, 2560
        beq     .L1
.L6:
        ldr     q30, [x3, x1]
        cmpeq   p15.s, p7/z, z30.s, z27.s
        b.none  .L2

and easily prove it correct.

Bootstrapped Regtested on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
-m32, -m64 and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

        PR target/118974
        * optabs.cc (prepare_cmp_insn): Refactor to take optab to check 
for
        instead of hardcoded cbranch.
        (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported.
        * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
        * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4): 
Document
        them.

---
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458b26f61904eaf5b24
 
100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1 and 
operand 2 are the
 first and second operands of the comparison, respectively.  Operand 3
 is the @code{code_label} to jump to.

+@cindex @code{cbranch_any@var{mode}4} instruction pattern
+@item @samp{cbranch_any@var{mode}4}
+Conditional branch instruction combined with a compare instruction on 
vectors
+where it is required that at least one of the elementwise comparisons of 
the
+two input vectors is true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
+@cindex @code{cbranch_all@var{mode}4} instruction pattern
+@item @samp{cbranch_all@var{mode}4}
+Conditional branch instruction combined with a compare instruction on 
vectors
+where it is required that at all of the elementwise comparisons of the
+two input vectors are true.
+Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
+first and second operands of the comparison, respectively.  Operand 3
+is the @code{code_label} to jump to.
+
 @cindex @code{jump} instruction pattern
 @item @samp{jump}
 A jump inside a function; an unconditional branch.  Operand 0 is the
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 
0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d391265bc45fd454530cc67
 
100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
machine_mode extr_mode)

    *PMODE is the mode of the inputs (in case they are const_int).

+   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
+   cbranch_optab.
+
    This function performs all the setup necessary so that the caller only 
has
    to emit a single comparison insn.  This setup can involve doing a 
BLKmode
    comparison or emitting a library call to perform the comparison if no 
insn
@@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode, 
machine_mode extr_mode)
 static void
 prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
                  int unsignedp, enum optab_methods methods,
-                 rtx *ptest, machine_mode *pmode)
+                 rtx *ptest, machine_mode *pmode, optab 
optab=cbranch_optab)
 {
   machine_mode mode = *pmode;
   rtx libfunc, test;
@@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
comparison, rtx size,
   FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
     {
       enum insn_code icode;
-      icode = optab_handler (cbranch_optab, cmp_mode);
+      icode = optab_handler (optab, cmp_mode);
       if (icode != CODE_FOR_nothing
          && insn_operand_matches (icode, 0, test))
        {
@@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
comparison, rtx size,
       if (comparison == UNORDERED && rtx_equal_p (x, y))
        {
          prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
-                           ptest, pmode);
+                           ptest, pmode, optab);
          if (*ptest)
            return;
        }
@@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code 
comparison, rtx size,

       *pmode = ret_mode;
       prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
-                       ptest, pmode);
+                       ptest, pmode, optab);
     }

   return;
@@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum 
rtx_code comparison, rtx size,
      the target supports tbranch.  */
   machine_mode tmode = mode;
   direct_optab optab;
-  if (op1 == CONST0_RTX (GET_MODE (op1))
-      && validate_test_and_branch (val, &test, &tmode,
-                                  &optab) != CODE_FOR_nothing)
+  if (op1 == CONST0_RTX (GET_MODE (op1)))
     {
-      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
-      return;
+      if (validate_test_and_branch (val, &test, &tmode,
+                                   &optab) != CODE_FOR_nothing)
+       {
+         emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, 
true);
+         return;
+       }
+
+      /* If we are comparing equality with 0, check if VAL is another 
equality
+        comparison and if the target supports it directly.  */
+      if (val && TREE_CODE (val) == SSA_NAME
+         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
+         && (comparison == NE || comparison == EQ))
+       {
+         auto def_stmt = SSA_NAME_DEF_STMT (val);

I think you may only look at the definition via get_def_for_expr (aka 
TER), and the result may be NULL.

+             if ((icode = optab_handler (optab, mode2))
+                 != CODE_FOR_nothing

can we perform this check before expanding the operands please?
Can we use vector_compare_rtx ().

Somehow I feel this belongs in a separate function like
validate_test_and_branch.

Richard.

> I think in the patch you've missed that the RTL from the expand is 
> overwritten.
> 
> Basically the new optabs are doing essentially the same thing that you
> are suggesting, but without needing the dummy second comparison_operator.
> 
> Notice how in the patch the comparison operator is the general
> 
> aarch64_comparison_operator
> 
> and not the stricter aarch64_equality_operator.
> 
> Thanks,
> Tamar
> > 
> > Richard.
> > 
> > > I.e. with them we can now generate
> > >
> > > .L2:
> > >         ldr     q31, [x1, x2]
> > >         add     v29.4s, v29.4s, v25.4s
> > >         add     v28.4s, v28.4s, v26.4s
> > >         add     v31.4s, v31.4s, v30.4s
> > >         str     q31, [x1, x2]
> > >         add     x1, x1, 16
> > >         cmp     x1, 2560
> > >         beq     .L1
> > > .L6:
> > >         ldr     q30, [x3, x1]
> > >         cmpeq   p15.s, p7/z, z30.s, z27.s
> > >         b.none  .L2
> > >
> > > and easily prove it correct.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/118974
> > >   * optabs.cc (prepare_cmp_insn): Refactor to take optab to check for
> > >   instead of hardcoded cbranch.
> > >   (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported.
> > >   * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
> > >   * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4):
> > Document
> > >   them.
> > >
> > > ---
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index
> > f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458
> > b26f61904eaf5b24 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1
> > and operand 2 are the
> > >  first and second operands of the comparison, respectively.  Operand 3
> > >  is the @code{code_label} to jump to.
> > >
> > > +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> > > +@item @samp{cbranch_any@var{mode}4}
> > > +Conditional branch instruction combined with a compare instruction on 
> > > vectors
> > > +where it is required that at least one of the elementwise comparisons of 
> > > the
> > > +two input vectors is true.
> > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > +first and second operands of the comparison, respectively.  Operand 3
> > > +is the @code{code_label} to jump to.
> > > +
> > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > +@item @samp{cbranch_all@var{mode}4}
> > > +Conditional branch instruction combined with a compare instruction on 
> > > vectors
> > > +where it is required that at all of the elementwise comparisons of the
> > > +two input vectors are true.
> > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > +first and second operands of the comparison, respectively.  Operand 3
> > > +is the @code{code_label} to jump to.
> > > +
> > >  @cindex @code{jump} instruction pattern
> > >  @item @samp{jump}
> > >  A jump inside a function; an unconditional branch.  Operand 0 is the
> > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > index
> > 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d39126
> > 5bc45fd454530cc67 100644
> > > --- a/gcc/optabs.cc
> > > +++ b/gcc/optabs.cc
> > > @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode
> > vec_mode, machine_mode extr_mode)
> > >
> > >     *PMODE is the mode of the inputs (in case they are const_int).
> > >
> > > +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> > > +   cbranch_optab.
> > > +
> > >     This function performs all the setup necessary so that the caller 
> > > only has
> > >     to emit a single comparison insn.  This setup can involve doing a 
> > > BLKmode
> > >     comparison or emitting a library call to perform the comparison if no 
> > > insn
> > > @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode
> > vec_mode, machine_mode extr_mode)
> > >  static void
> > >  prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> > >             int unsignedp, enum optab_methods methods,
> > > -           rtx *ptest, machine_mode *pmode)
> > > +           rtx *ptest, machine_mode *pmode, optab
> > optab=cbranch_optab)
> > >  {
> > >    machine_mode mode = *pmode;
> > >    rtx libfunc, test;
> > > @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > >    FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
> > >      {
> > >        enum insn_code icode;
> > > -      icode = optab_handler (cbranch_optab, cmp_mode);
> > > +      icode = optab_handler (optab, cmp_mode);
> > >        if (icode != CODE_FOR_nothing
> > >     && insn_operand_matches (icode, 0, test))
> > >   {
> > > @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > >        if (comparison == UNORDERED && rtx_equal_p (x, y))
> > >   {
> > >     prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> > > -                     ptest, pmode);
> > > +                     ptest, pmode, optab);
> > >     if (*ptest)
> > >       return;
> > >   }
> > > @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > comparison, rtx size,
> > >
> > >        *pmode = ret_mode;
> > >        prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> > > -                 ptest, pmode);
> > > +                 ptest, pmode, optab);
> > >      }
> > >
> > >    return;
> > > @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum
> > rtx_code comparison, rtx size,
> > >       the target supports tbranch.  */
> > >    machine_mode tmode = mode;
> > >    direct_optab optab;
> > > -  if (op1 == CONST0_RTX (GET_MODE (op1))
> > > -      && validate_test_and_branch (val, &test, &tmode,
> > > -                            &optab) != CODE_FOR_nothing)
> > > +  if (op1 == CONST0_RTX (GET_MODE (op1)))
> > >      {
> > > -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> > > -      return;
> > > +      if (validate_test_and_branch (val, &test, &tmode,
> > > +                             &optab) != CODE_FOR_nothing)
> > > + {
> > > +   emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> > > +   return;
> > > + }
> > > +
> > > +      /* If we are comparing equality with 0, check if VAL is another 
> > > equality
> > > +  comparison and if the target supports it directly.  */
> > > +      if (val && TREE_CODE (val) == SSA_NAME
> > > +   && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> > > +   && (comparison == NE || comparison == EQ))
> > > + {
> > > +   auto def_stmt = SSA_NAME_DEF_STMT (val);
> > > +   enum insn_code icode;
> > > +   if (is_gimple_assign (def_stmt)
> > > +       && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> > > +            == tcc_comparison)
> > > +     {
> > > +       class expand_operand ops[2];
> > > +       rtx_insn *tmp = NULL;
> > > +       start_sequence ();
> > > +       rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> > > +       rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> > > +       machine_mode mode2 = GET_MODE (op0c);
> > > +       create_input_operand (&ops[0], op0c, mode2);
> > > +       create_input_operand (&ops[1], op1c, mode2);
> > > +
> > > +       int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> > > +       auto inner_code = gimple_assign_rhs_code (def_stmt);
> > > +       rtx test2 = NULL_RTX;
> > > +
> > > +       enum rtx_code comparison2 = get_rtx_code (inner_code, unsignedp2);
> > > +       if (unsignedp2)
> > > +         comparison2 = unsigned_condition (comparison2);
> > > +       if (comparison == NE)
> > > +         optab = vec_cbranch_any_optab;
> > > +       else
> > > +         optab = vec_cbranch_all_optab;
> > > +
> > > +       if ((icode = optab_handler (optab, mode2))
> > > +           != CODE_FOR_nothing
> > > +           && maybe_legitimize_operands (icode, 1, 2, ops))
> > > +         {
> > > +           prepare_cmp_insn (ops[0].value, ops[1].value, comparison2,
> > > +                             size, unsignedp2, OPTAB_DIRECT, &test2,
> > > +                             &mode2, optab);
> > > +           emit_cmp_and_jump_insn_1 (test2, mode2, label,
> > > +                                     optab, prob, false);
> > > +           tmp = get_insns ();
> > > +         }
> > > +
> > > +       end_sequence ();
> > > +       if (tmp)
> > > +         {
> > > +           emit_insn (tmp);
> > > +           return;
> > > +         }
> > > +     }
> > > + }
> > >      }
> > >
> > >    emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob, 
> > > false);
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index
> > 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b
> > 0be344e4856539bb 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
> > >  OPTAB_D (umulhs_optab, "umulhs$a3")
> > >  OPTAB_D (umulhrs_optab, "umulhrs$a3")
> > >  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> > > +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
> > > +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
> > >  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> > >  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
> > >  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Reply via email to