RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Tamar Christina Tue, 10 Jun 2025 07:44:10 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Tuesday, June 10, 2025 2:12 PM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford <richard.sandif...@arm.com>;
> nd <n...@arm.com>
> Subject: RE: [PATCH 1/3]middle-end: support vec_cbranch_any and
> vec_cbranch_all [PR118974]
> 
> On Mon, 9 Jun 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguent...@suse.de>
> > > Sent: Monday, June 9, 2025 10:30 AM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> <richard.sandif...@arm.com>;
> > > nd <n...@arm.com>
> > > Subject: Re: [PATCH 1/3]middle-end: support vec_cbranch_any and
> > > vec_cbranch_all [PR118974]
> > >
> > > On Mon, 9 Jun 2025, Tamar Christina wrote:
> > >
> > > > This patch introduces two new vector cbranch optabs vec_cbranch_any and
> > > > vec_cbranch_all.
> > > >
> > > > To explain why we need two new optabs let me explain the current cbranch
> and
> > > its
> > > > limitations and what I'm trying to optimize. So sorry for the long 
> > > > email, but I
> > > > hope it explains why I think we want new optabs.
> > > >
> > > > Today cbranch can be used for both vector and scalar modes.  In both 
> > > > these
> > > > cases it's intended to compare boolean values, either scalar or vector.
> > > >
> > > > The optab documentation does not however state that it can only handle
> > > > comparisons against 0.  So many targets have added code for the vector
> variant
> > > > that tries to deal with the case where we branch based on two non-zero
> > > > registers.
> > > >
> > > > However this code can't ever be reached because the cbranch expansion 
> > > > only
> > > deals
> > > > with comparisons against 0 for vectors.  This is because for vectors 
> > > > the rest of
> > > > the compiler has no way to generate a non-zero comparison. e.g. the
> vectorizer
> > > > will always generate a zero comparison, and the C/C++ front-ends won't
> allow
> > > > vectors to be used in a cbranch as it expects a boolean value.  ISAs 
> > > > like SVE
> > > > work around this by requiring you to use an SVE PTEST intrinsics which 
> > > > results
> > > > in a single scalar boolean value that represents the flag values.
> > > >
> > > > e.g. if (svptest_any (..))
> > > >
> > > > The natural question is why do we not at expand time then rewrite the
> > > comparison
> > > > to a non-zero comparison if the target supports it.
> > > >
> > > > The reason is we can't safely do so.  For an ANY comparison (e.g. != b) 
> > > > this is
> > > > trivial, but for an ALL comparison (e.g. == b) we would have to flip 
> > > > both
> branch
> > > > and invert the value being compared.  i.e. we have to make it a != b
> comparison.
> > > >
> > > > But in emit_cmp_and_jump_insns we can't flip the branches anymore
> because
> > > they
> > > > have already been lowered into a fall through branch (PC) and a label, 
> > > > ready
> for
> > > > use in an if_then_else RTL expression.
> > > >
> > > > Additionally as mentioned before, cbranches expect the values to be 
> > > > masks,
> not
> > > > values.  This kinda works out if you XOR the values, but for FP vectors 
> > > > you'd
> > > > need to know what equality means for the FP format.  i.e. it's possible 
> > > > for
> > > > IEEE 754 values but It's not immediately obvious if this is true for all
> > > > formats.
> > > >
> > > > Now why does any of this matter?  Well there are two optimizations we 
> > > > want
> to
> > > be
> > > > able to do.
> > > >
> > > > 1. Adv. SIMD does not support a vector !=, as in there's no instruction 
> > > > for it.
> > > >    For both Integer and FP vectors we perform the comparisons as EQ and
> then
> > > >    invert the resulting mask.  Ideally we'd like to replace this with 
> > > > just a XOR
> > > >    and the appropriate branch.
> > > >
> > > > 2. When on an SVE enabled system we would like to use an SVE compare +
> > > branch
> > > >    for the Adv. SIMD sequence which could happen due to cost modelling.
> > > However
> > > >    we can only do so based on if we know that the values being compared
> against
> > > >    are the boolean masks.  This means we can't really use combine to do 
> > > > this
> > > >    because combine would have to match the entire sequence including the
> > > >    vector comparisons because at RTL we've lost the information that
> > > >    VECTOR_BOOLEAN_P would have given us.  This sequence would be too
> long
> > > for
> > > >    combine to match due to it having to match the compare + branch 
> > > > sequence
> > > >    being generated as well.  It also becomes a bit messy to match ANY 
> > > > and ALL
> > > >    sequences.
> > >
> > > I guess I didn't really understand the above, esp. why all of this is
> > > not recoverable on RTL, so I'll state what I think
> > > cbranch is supposed to do and then ask some question on the proposal.
> > >
> >
> > Quite simply the log chain is too long. SVE has flag setting comparisons, 
> > so in a
> > single instruction we can perform the final comparison. (which doesn't need 
> > to
> > be restricted to EQ or NE) and then using the appropriate condition code 
> > decide
> > if we have an ANY or ALL comparison.  So the sequence to match is very long.
> >
> > That is to say, in gimple we have
> >
> > a = b > c
> > if (a != 0)
> >   ...
> > and SVE doesn't need the indirection.  Conceptually this makes it
> >
> > if (any (b > c))
> >  ...
> >
> > For SVE the cbranch expands to a predicate test, which also sets the flag 
> > again.
> > this duplication in flag setting is later then removed by various patterns 
> > in
> > the backend.  This is all fine, but where it gets difficult is trying to 
> > replace the
> > Adv. SIMD Compare + branch with an SVE one.  To do this it would have to 
> > match
> > the entirety of
> >
> > a = b > c
> > if (a != 0)
> >
> > which in RTL is
> >
> > (insn 66 65 67 4 (set (reg:V16QI 185 [ mask_patt_9.20_138 ])
> >         (neg:V16QI (gtu:V16QI (reg:V16QI 183)
> >                 (reg:V16QI 184)))) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 67 66 68 4 (set (reg:V4SI 186)
> >         (unspec:V4SI [
> >                 (subreg:V4SI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0)
> >                 (subreg:V4SI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0)
> >             ] UNSPEC_UMAXV)) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 68 67 69 4 (set (reg:V16QI 185 [ mask_patt_9.20_138 ])
> >         (subreg:V16QI (reg:V4SI 186) 0)) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 69 68 70 4 (set (reg:DI 187)
> >         (subreg:DI (reg:V16QI 185 [ mask_patt_9.20_138 ]) 0))
> "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 70 69 71 4 (set (reg:CC 66 cc)
> >         (compare:CC (reg:DI 187)
> >             (const_int 0 [0]))) "/app/example.cpp":29:7 -1
> >      (nil))
> > (jump_insn 71 70 72 4 (set (pc)
> >         (if_then_else (ne (reg:CC 66 cc)
> >                 (const_int 0 [0]))
> >             (label_ref:DI 397)
> >             (pc))) "/app/example.cpp":29:7 19 {condjump}
> >
> > And any intermediate "submatch" isn't really valid. The sequence would need 
> > to
> be matched
> > as a whole or not at all.
> >
> > The second problem is the lack of Adv. SIMD is having a vector != 
> > comparison.
> >
> > This means there's an additional "NOT" in inverting the mask.   This 
> > operation can
> be optimized
> > in three ways.
> >
> > 1.  it can be converted from a MAX reduction into a MIN reduction and then 
> > the
> NOT can be removed.
> >      The branch would have to be inverted.
> > 2.  The data vectors can be XOR'ed together, and inequality means a 
> > non-zero bit
> anywhere in the result.
> > 3.  Use an SVE comparison, which does have != on vectors.
> >
> > This chain is again quite long to match, and we can't incrementally match 
> > it since
> it's only value if we know
> > we're using the results in a zero or non-zero comparison.
> >
> > > cbranch is supposed to branch on the result of a comparison,
> > > being it on vector or scalar operands, the result of the comparison is
> > > a scalar, or rather in full generality, a (scalar) condition-code.
> > > Given it uses a COMPARE RTX as comparison operator we can currently
> > > only handle whole vector equality/inequality which map to all/any.
> >
> > Correct, and normally this is fine, but for SVE we have support for what's
> > essentially "vector" condition code.  This is why I also named the new 
> > optabs
> > vec_cbranch.
> >
> > The biggest difference is that instead of requiring the reduction to a 
> > scalar
> > It allows the branching on vector elements comparisons *directly*.  Which 
> > makes
> > it much easier to do the two fixes above.
> >
> > >
> > > > To handle these two cases I propose the new optabs vec_cbranch_any and
> > > > vec_cbranch_all that expect the operands to be values, not masks, and 
> > > > the
> > > > comparison operation is the comparison being performed.  The current
> cbranch
> > > > optab can't be used here because we need to be able to see both 
> > > > comparison
> > > > operators (for the boolean branch and the data compare branch).
> > > >
> > > > The initial != 0 or == 0 is folded away into the name as _any and _all 
> > > > allowing
> > > > the target to easily do as it wishes.
> > > >
> > > > I have intentionally chosen to require cbranch_optab to still be the 
> > > > main one.
> > > > i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch
> because
> > > > these two are an expand time only construct.  I've also chosen them to 
> > > > be this
> > > > such that there's no change needed to any other passes in the 
> > > > middle-end.
> > > >
> > > > With these two optabs it's trivial to implement the two optimization I
> described
> > > > above.  A target expansion hook is also possible but optabs felt more 
> > > > natural
> > > > for the situation.
> > >
> > > What's the RTL the optabs expand to for SVE/AdvSIMD?  How do you
> > > represent the any/all vector compare difference?  Looking at 3/3 I see
> > > you use UNSPECs.
> > >
> >
> > You mean in the expand RTL? That's just so I can make the iterators.  The
> expansion is:
> >
> > (insn 31 30 32 5 (parallel [
> >             (set (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
> >                 (unspec:VNx4BI [
> >                         (reg:VNx4BI 134)
> >                         (const_int 1 [0x1])
> >                         (gtu:VNx4BI (reg:VNx4QI 112 [ vect__1.16 ])
> >                             (reg:VNx4QI 128))
> >                     ] UNSPEC_PRED_Z))
> >             (clobber (reg:CC_NZC 66 cc))
> >         ]) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 32 31 33 5 (set (reg:VNx4BI 136 [ vec_mask_and_42 ])
> >         (and:VNx4BI (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
> >             (reg:VNx4BI 111 [ loop_mask_36 ]))) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 33 32 34 5 (set (reg:VNx16BI 137)
> >         (const_vector:VNx16BI repeat [
> >                 (const_int 1 [0x1])
> >                 (const_int 0 [0]) repeated x3
> >             ])) "/app/example.cpp":29:7 -1
> >      (nil))
> > (insn 34 33 35 5 (set (reg:CC_NZC 66 cc)
> >         (unspec:CC_NZC [
> >                 (reg:VNx16BI 137)
> >                 (subreg:VNx4BI (reg:VNx16BI 137) 0)
> >                 (const_int 1 [0x1])
> >                 (reg:VNx4BI 136 [ vec_mask_and_42 ])
> >             ] UNSPEC_PTEST)) "/app/example.cpp":29:7 -1
> >      (nil))
> > (jump_insn 35 34 36 5 (set (pc)
> >         (if_then_else (eq (reg:CC_NZC 66 cc)
> >                 (const_int 0 [0]))
> >             (label_ref 41)
> >             (pc))) "/app/example.cpp":29:7 -1
> >      (int_list:REG_BR_PROB 1014686028 (nil))
> >  -> 41)
> >
> > Notice how insn 31 which is the vector compare sets CC_NZC.
> 
> That's a "scalar" condition code?
>


Sure, depends on how you view it. It's set by a vector instruction.
SVE doesn't need the reduction to scalar. As In the predicate
result is unused and marked as dead.  I think that's the core distinction
here and one you pointed out below.  This is about not needing the
reduction for branching.  Or rather the result of the compare isn't
needed for the branch itself.

> > Insn 32 is the masking of the compare, it'll be moved into insn 31 by
> > combine replacing the all true predicate.
> 
> So the (reg:VNx4BI 133) is essentially unused, but it's required to ...
> 
> > Insn 34 is a flag reshaping operation.  In case the predicates don't
> > have the same element width the CC reg values needs to be
> > reshaped (in case you're doing an eg. b.first (branch if first element 
> > matches)
> etc).
> 
> ... do this, which produces the actual CC_NZC we jump on.

No, it's only needed because the codegen above is generated from the current
trunk which has a different instruction doing the data comparison.

133 is not needed at all. And is only there because insn 34 is what is mapped to
a cbranch.  So this is the code we have today because there's no way for us to
not emit the ptest initially ☹

And our current ptest also emits CC_NZC instead of CC_Z, while ptest does set 
these
for cbranch we don't need them. So I'm changing this to CC_Z already.

> 
> > However because insn 35 only cares about 0 or !0 the shape doesn't matter.
> > So insn 34 will be eliminated by combine.
> >
> > This leaves insn and 31 and 35.  Essentially only the compare + branch,
> > directly on the vector elements, which is what vec_cbranch is now
> > representing.
> >
> > The optab allows us to generate the paradoxical subregs to convert
> > Adv. SIMD registers into SVE ones, and the predicate mask to make
> > the operation safe.
> >
> > Without this, we have to rely on combine which (even if it could match
> > such long sequences) would have to not just replace the adv. simd + b
> > with an SVE + b, but also do all the predicate optimization in the one
> > single pattern.
> >
> > > What kind of compares do you expect us to generate?  Any besides
> > > equality compares?
> >
> > Any compare is possible. Since SVE vector comparisons take all 6
> > comparison operators, there's also floating point versions and
> > unordered variants.  Basically in the ISA vectors are first class
> > control flow elements.
> >
> > > Why would we not allow cbranch to take an
> > > operand 0 that specifies the desired reduction?
> >
> > That's certainly possible, but that's just going to make cbranch itself
> > more complicated to understand.  I chose to go with _any and _all
> > since as you say, for the branch comparison itself only != 0 and == 0
> > makes sense.  So cbranch would need to get two operators in this case
> > one for the branch comparison and one for the data comparison.
> >
> > i.e. if expanding
> >
> > a = b > c
> > if (a != 0)
> >
> > branch would get
> >   (set (reg a)
> >     (gt (reg b) (reg c))
> >   (clobber cc_nz)
> >   (ne (reg a) 0)
> >   (if_then_else ...)
> >
> > I basically opted for a simpler change, since you can only have != 0 and == > > 0
> > this is just encoded in the name "any" and "all" so we don't have to send 
> > the
> > dummy operators along.
> 
> Well, if we'd introduce ANY_LT_EXPR, ANY_NE_EXPR, ... ALL_LT_EXPR,
> ALL_NE_EXPR ... then we can have on GIMPLE
> 
>  if (a ANY_GT c)
> 
> and no need for a separate compare and compare-and-branch.

Yes, new comparison codes would work here too.
But then I'll need 12 of them.. But could do so if you prefer 😊

But you're right, conceptually my new optabs are introducing an
expand time only representation of these 12 operators.

> 
> > > We could
> > > have (any:CC (eq:V4BI (reg:V4SI) (reg:V4SI))) or so, or alternatively
> > > (more intrusive) have (any_eq:CC (reg:V4SI) (reg:V4SI))?  That
> > > would also allow you to have a proper RTL representation rather
> > > than more and more UNSEPCs.
> >
> > We do have proper RTL representation though..
> 
> You have
> 
> > (insn 31 30 32 5 (parallel [
> >             (set (reg:VNx4BI 133 [ mask_patt_13.20_41 ])
> >                 (unspec:VNx4BI [
> >                         (reg:VNx4BI 134)
> >                         (const_int 1 [0x1])
> >                         (gtu:VNx4BI (reg:VNx4QI 112 [ vect__1.16 ])
> >                             (reg:VNx4QI 128))
> >                     ] UNSPEC_PRED_Z))
> >             (clobber (reg:CC_NZC 66 cc))
> 
> that's not
> 
>   (set (reg:CC_NZC 66 cc) (any_gtu:CC_NZC (reg:VNx4QI 112 [ vect__1.16 ])
>                             ((reg:VNx4QI 128)))
> 
> so nothing combine or simplify-rtx can work with (because of the UNSPEC).
> 

Sure, I don’t think any_gtu provides any value here though. this information
"any" or "all" is a property of the use of the result not the comparison in RTL.

So the "none" is here

(jump_insn 35 34 36 5 (set (pc)
         (if_then_else (eq (reg:CC_NZC 66 cc)
                 (const_int 0 [0]))
             (label_ref 41)
             (pc))) "/app/example.cpp":29:7 -1
      (int_list:REG_BR_PROB 1014686028 (nil))

So I don’t think that the ANY or ALL was ever a property of the compare,
But rather what you do with it.
Which is why today in gimple we have the != 0 or == 0 in the if no?

> That said, I wanted to know whether SVE/NEON can do {any,all}_<compare>
> ops on vectors, setting the CC flag register you can then do a
> conditional jump on.  And that seems to be the case and what you
> are after.
> 

Yes, SVE can indeed for integer comparisons. Floating point needs the
ptest.

Also other operations can do this too, for instance the MATCH instruction
being worked on now etc

> But that's exactly what the cbranch optab is supposed to facilitate.
> What we're lacking is communication of the all vs. any.  Your choice
> is to do that by using different optabs, 'vec_cbranch_any' vs.
> 'vec_cbranch_all'.  I see how thats "easy", you just have to adjust
> RTL expansion.
> 

Indeed, and the benefit of this is that we can expand directly to the
optimal sequence, rather than relying on later passes to eliminate
the unneeded predicate operations.

> On GIMPLE you could argue we have everything, we can support
> 
>  if (_1 ==/!= {-1, -1, ... })
> 
> to match vec_cbranch_all/any for an inverted compare?
> 

Yes, so my patch at expand time just checks if _1 is a compare
And if so, reads the arguments of _1 and based on ==/!= it
chooses vec_cbranch_all/any and passes the argument of the
compare to the optab.   It can't handle {-1,-1,-1,...} today because
the expansion happens quite low during expand, so we can no
longer flip the branches to make them a normal != 0 compare.

I hadn't worried about this since it would just fall back to cbranch
and we no longer generate this representation.

> I see one of your issues is that the AND does not set flags, right?
> You do need the ptest once you have that?
> 

The AND is just because the middle-end can only generate an unpredicated
compare. And so we rely on combine to push the predicate inside eventually.

So we don't need the AND itself to set flags, because for early exits all we 
care
About is CC_Z, and the AND can have an effect on N and C but not Z.

So for pure SVE I can eliminate it (see patch 2/3) but for the case where we 
have
to generate an SVE compare for Adv. SIMD (as it does the flag setting) means 
that
we have to match everything from the compare to the if_then_else.  You need
the compare since you have to replace it with the SVE variant, and you need the
if_then_else to realize if you have any or all.

The optab allows us to avoid this by just emitting the SVE commit directly, and
completely avoid the AND by generating a predicated compare.

So I am trying to fix some of the optimization in RTL, but for some cases 
expanding
correctly is the most reliable way.

I would have though that the two optabs would be preferrable to 12 new 
comparison
operators, but could make those work too 😊

Thanks,
Tamar

> 
> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> +@item @samp{cbranch_any@var{mode}4}
> +Conditional branch instruction combined with a compare instruction on
> vectors
> +where it is required that at least one of the elementwise comparisons of
> the
> +two input vectors is true.
> 
> That sentence is a bit odd.  Operand 0 is a comparison operator that
> is applied to all vector elements.  When at least one comparison evaluates
> to true the branch is taken.
> 
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> 
> Tamar Christina
> 
> AttachmentsJun 9, 2025, 8:03 AM (1 day ago)
> 
> to gcc-patches, nd, rguenther
> This patch introduces two new vector cbranch optabs vec_cbranch_any and
> vec_cbranch_all.
> 
> To explain why we need two new optabs let me explain the current cbranch
> and its
> limitations and what I'm trying to optimize. So sorry for the long email,
> but I
> hope it explains why I think we want new optabs.
> 
> Today cbranch can be used for both vector and scalar modes.  In both these
> cases it's intended to compare boolean values, either scalar or vector.
> 
> The optab documentation does not however state that it can only handle
> comparisons against 0.  So many targets have added code for the vector
> variant
> that tries to deal with the case where we branch based on two non-zero
> registers.
> 
> However this code can't ever be reached because the cbranch expansion only
> deals
> with comparisons against 0 for vectors.  This is because for vectors the
> rest of
> the compiler has no way to generate a non-zero comparison. e.g. the
> vectorizer
> will always generate a zero comparison, and the C/C++ front-ends won't
> allow
> vectors to be used in a cbranch as it expects a boolean value.  ISAs like
> SVE
> work around this by requiring you to use an SVE PTEST intrinsics which
> results
> in a single scalar boolean value that represents the flag values.
> 
> e.g. if (svptest_any (..))
> 
> The natural question is why do we not at expand time then rewrite the
> comparison
> to a non-zero comparison if the target supports it.
> 
> The reason is we can't safely do so.  For an ANY comparison (e.g. != b)
> this is
> trivial, but for an ALL comparison (e.g. == b) we would have to flip both
> branch
> and invert the value being compared.  i.e. we have to make it a != b
> comparison.
> 
> But in emit_cmp_and_jump_insns we can't flip the branches anymore because
> they
> have already been lowered into a fall through branch (PC) and a label,
> ready for
> use in an if_then_else RTL expression.
> 
> Additionally as mentioned before, cbranches expect the values to be masks,
> not
> values.  This kinda works out if you XOR the values, but for FP vectors
> you'd
> need to know what equality means for the FP format.  i.e. it's possible
> for
> IEEE 754 values but It's not immediately obvious if this is true for all
> formats.
> 
> Now why does any of this matter?  Well there are two optimizations we want
> to be
> able to do.
> 
> 1. Adv. SIMD does not support a vector !=, as in there's no instruction
> for it.
>    For both Integer and FP vectors we perform the comparisons as EQ and
> then
>    invert the resulting mask.  Ideally we'd like to replace this with just
> a XOR
>    and the appropriate branch.
> 
> 2. When on an SVE enabled system we would like to use an SVE compare +
> branch
>    for the Adv. SIMD sequence which could happen due to cost modelling.
> However
>    we can only do so based on if we know that the values being compared
> against
>    are the boolean masks.  This means we can't really use combine to do
> this
>    because combine would have to match the entire sequence including the
>    vector comparisons because at RTL we've lost the information that
>    VECTOR_BOOLEAN_P would have given us.  This sequence would be too long
> for
>    combine to match due to it having to match the compare + branch
> sequence
>    being generated as well.  It also becomes a bit messy to match ANY and
> ALL
>    sequences.
> 
> To handle these two cases I propose the new optabs vec_cbranch_any and
> vec_cbranch_all that expect the operands to be values, not masks, and the
> comparison operation is the comparison being performed.  The current
> cbranch
> optab can't be used here because we need to be able to see both comparison
> operators (for the boolean branch and the data compare branch).
> 
> The initial != 0 or == 0 is folded away into the name as _any and _all
> allowing
> the target to easily do as it wishes.
> 
> I have intentionally chosen to require cbranch_optab to still be the main
> one.
> i.e. you can't have vec_cbranch_any/vec_cbranch_all without cbranch
> because
> these two are an expand time only construct.  I've also chosen them to be
> this
> such that there's no change needed to any other passes in the middle-end.
> 
> With these two optabs it's trivial to implement the two optimization I
> described
> above.  A target expansion hook is also possible but optabs felt more
> natural
> for the situation.
> 
> I.e. with them we can now generate
> 
> .L2:
>         ldr     q31, [x1, x2]
>         add     v29.4s, v29.4s, v25.4s
>         add     v28.4s, v28.4s, v26.4s
>         add     v31.4s, v31.4s, v30.4s
>         str     q31, [x1, x2]
>         add     x1, x1, 16
>         cmp     x1, 2560
>         beq     .L1
> .L6:
>         ldr     q30, [x3, x1]
>         cmpeq   p15.s, p7/z, z30.s, z27.s
>         b.none  .L2
> 
> and easily prove it correct.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>         PR target/118974
>         * optabs.cc (prepare_cmp_insn): Refactor to take optab to check
> for
>         instead of hardcoded cbranch.
>         (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if supported.
>         * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): New.
>         * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4):
> Document
>         them.
> 
> ---
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index
> f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458
> b26f61904eaf5b24
> 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1 and
> operand 2 are the
>  first and second operands of the comparison, respectively.  Operand 3
>  is the @code{code_label} to jump to.
> 
> +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> +@item @samp{cbranch_any@var{mode}4}
> +Conditional branch instruction combined with a compare instruction on
> vectors
> +where it is required that at least one of the elementwise comparisons of
> the
> +two input vectors is true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> +
> +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> +@item @samp{cbranch_all@var{mode}4}
> +Conditional branch instruction combined with a compare instruction on
> vectors
> +where it is required that at all of the elementwise comparisons of the
> +two input vectors are true.
> +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> +first and second operands of the comparison, respectively.  Operand 3
> +is the @code{code_label} to jump to.
> +
>  @cindex @code{jump} instruction pattern
>  @item @samp{jump}
>  A jump inside a function; an unconditional branch.  Operand 0 is the
> diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> index
> 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d39126
> 5bc45fd454530cc67
> 100644
> --- a/gcc/optabs.cc
> +++ b/gcc/optabs.cc
> @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode vec_mode,
> machine_mode extr_mode)
> 
>     *PMODE is the mode of the inputs (in case they are const_int).
> 
> +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> +   cbranch_optab.
> +
>     This function performs all the setup necessary so that the caller only
> has
>     to emit a single comparison insn.  This setup can involve doing a
> BLKmode
>     comparison or emitting a library call to perform the comparison if no
> insn
> @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode vec_mode,
> machine_mode extr_mode)
>  static void
>  prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
>                   int unsignedp, enum optab_methods methods,
> -                 rtx *ptest, machine_mode *pmode)
> +                 rtx *ptest, machine_mode *pmode, optab
> optab=cbranch_optab)
>  {
>    machine_mode mode = *pmode;
>    rtx libfunc, test;
> @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> comparison, rtx size,
>    FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
>      {
>        enum insn_code icode;
> -      icode = optab_handler (cbranch_optab, cmp_mode);
> +      icode = optab_handler (optab, cmp_mode);
>        if (icode != CODE_FOR_nothing
>           && insn_operand_matches (icode, 0, test))
>         {
> @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> comparison, rtx size,
>        if (comparison == UNORDERED && rtx_equal_p (x, y))
>         {
>           prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, OPTAB_WIDEN,
> -                           ptest, pmode);
> +                           ptest, pmode, optab);
>           if (*ptest)
>             return;
>         }
> @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> comparison, rtx size,
> 
>        *pmode = ret_mode;
>        prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> -                       ptest, pmode);
> +                       ptest, pmode, optab);
>      }
> 
>    return;
> @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum
> rtx_code comparison, rtx size,
>       the target supports tbranch.  */
>    machine_mode tmode = mode;
>    direct_optab optab;
> -  if (op1 == CONST0_RTX (GET_MODE (op1))
> -      && validate_test_and_branch (val, &test, &tmode,
> -                                  &optab) != CODE_FOR_nothing)
> +  if (op1 == CONST0_RTX (GET_MODE (op1)))
>      {
> -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> -      return;
> +      if (validate_test_and_branch (val, &test, &tmode,
> +                                   &optab) != CODE_FOR_nothing)
> +       {
> +         emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob,
> true);
> +         return;
> +       }
> +
> +      /* If we are comparing equality with 0, check if VAL is another
> equality
> +        comparison and if the target supports it directly.  */
> +      if (val && TREE_CODE (val) == SSA_NAME
> +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> +         && (comparison == NE || comparison == EQ))
> +       {
> +         auto def_stmt = SSA_NAME_DEF_STMT (val);
> 
> I think you may only look at the definition via get_def_for_expr (aka
> TER), and the result may be NULL.
> 
> +             if ((icode = optab_handler (optab, mode2))
> +                 != CODE_FOR_nothing
> 
> can we perform this check before expanding the operands please?
> Can we use vector_compare_rtx ().
> 
> Somehow I feel this belongs in a separate function like
> validate_test_and_branch.
> 
> Richard.
> 
> > I think in the patch you've missed that the RTL from the expand is 
> > overwritten.
> >
> > Basically the new optabs are doing essentially the same thing that you
> > are suggesting, but without needing the dummy second comparison_operator.
> >
> > Notice how in the patch the comparison operator is the general
> >
> > aarch64_comparison_operator
> >
> > and not the stricter aarch64_equality_operator.
> >
> > Thanks,
> > Tamar
> > >
> > > Richard.
> > >
> > > > I.e. with them we can now generate
> > > >
> > > > .L2:
> > > >         ldr     q31, [x1, x2]
> > > >         add     v29.4s, v29.4s, v25.4s
> > > >         add     v28.4s, v28.4s, v26.4s
> > > >         add     v31.4s, v31.4s, v30.4s
> > > >         str     q31, [x1, x2]
> > > >         add     x1, x1, 16
> > > >         cmp     x1, 2560
> > > >         beq     .L1
> > > > .L6:
> > > >         ldr     q30, [x3, x1]
> > > >         cmpeq   p15.s, p7/z, z30.s, z27.s
> > > >         b.none  .L2
> > > >
> > > > and easily prove it correct.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > > -m32, -m64 and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/118974
> > > >         * optabs.cc (prepare_cmp_insn): Refactor to take optab to check 
> > > > for
> > > >         instead of hardcoded cbranch.
> > > >         (emit_cmp_and_jump_insns): Try to emit a vec_cbranch if 
> > > > supported.
> > > >         * optabs.def (vec_cbranch_any_optab, vec_cbranch_all_optab): 
> > > > New.
> > > >         * doc/md.texi (cbranch_any@var{mode}4, cbranch_any@var{mode}4):
> > > Document
> > > >         them.
> > > >
> > > > ---
> > > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > > index
> > >
> f6314af46923beee0100a1410f089efd34d7566d..7dbfbbe1609a196b0834d458
> > > b26f61904eaf5b24 100644
> > > > --- a/gcc/doc/md.texi
> > > > +++ b/gcc/doc/md.texi
> > > > @@ -7622,6 +7622,24 @@ Operand 0 is a comparison operator.  Operand 1
> > > and operand 2 are the
> > > >  first and second operands of the comparison, respectively.  Operand 3
> > > >  is the @code{code_label} to jump to.
> > > >
> > > > +@cindex @code{cbranch_any@var{mode}4} instruction pattern
> > > > +@item @samp{cbranch_any@var{mode}4}
> > > > +Conditional branch instruction combined with a compare instruction on
> vectors
> > > > +where it is required that at least one of the elementwise comparisons 
> > > > of the
> > > > +two input vectors is true.
> > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > > +first and second operands of the comparison, respectively.  Operand 3
> > > > +is the @code{code_label} to jump to.
> > > > +
> > > > +@cindex @code{cbranch_all@var{mode}4} instruction pattern
> > > > +@item @samp{cbranch_all@var{mode}4}
> > > > +Conditional branch instruction combined with a compare instruction on
> vectors
> > > > +where it is required that at all of the elementwise comparisons of the
> > > > +two input vectors are true.
> > > > +Operand 0 is a comparison operator.  Operand 1 and operand 2 are the
> > > > +first and second operands of the comparison, respectively.  Operand 3
> > > > +is the @code{code_label} to jump to.
> > > > +
> > > >  @cindex @code{jump} instruction pattern
> > > >  @item @samp{jump}
> > > >  A jump inside a function; an unconditional branch.  Operand 0 is the
> > > > diff --git a/gcc/optabs.cc b/gcc/optabs.cc
> > > > index
> > >
> 0a14b1eef8a5795e6fd24ade6da55841696315b8..77d5e6ee5d26ccda3d39126
> > > 5bc45fd454530cc67 100644
> > > > --- a/gcc/optabs.cc
> > > > +++ b/gcc/optabs.cc
> > > > @@ -4418,6 +4418,9 @@ can_vec_extract_var_idx_p (machine_mode
> > > vec_mode, machine_mode extr_mode)
> > > >
> > > >     *PMODE is the mode of the inputs (in case they are const_int).
> > > >
> > > > +   *OPTAB is the optab to check for OPTAB_DIRECT support.  Defaults to
> > > > +   cbranch_optab.
> > > > +
> > > >     This function performs all the setup necessary so that the caller 
> > > > only has
> > > >     to emit a single comparison insn.  This setup can involve doing a 
> > > > BLKmode
> > > >     comparison or emitting a library call to perform the comparison if 
> > > > no insn
> > > > @@ -4429,7 +4432,7 @@ can_vec_extract_var_idx_p (machine_mode
> > > vec_mode, machine_mode extr_mode)
> > > >  static void
> > > >  prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, rtx size,
> > > >                   int unsignedp, enum optab_methods methods,
> > > > -                 rtx *ptest, machine_mode *pmode)
> > > > +                 rtx *ptest, machine_mode *pmode, optab
> > > optab=cbranch_optab)
> > > >  {
> > > >    machine_mode mode = *pmode;
> > > >    rtx libfunc, test;
> > > > @@ -4547,7 +4550,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > comparison, rtx size,
> > > >    FOR_EACH_WIDER_MODE_FROM (cmp_mode, mode)
> > > >      {
> > > >        enum insn_code icode;
> > > > -      icode = optab_handler (cbranch_optab, cmp_mode);
> > > > +      icode = optab_handler (optab, cmp_mode);
> > > >        if (icode != CODE_FOR_nothing
> > > >           && insn_operand_matches (icode, 0, test))
> > > >         {
> > > > @@ -4580,7 +4583,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > comparison, rtx size,
> > > >        if (comparison == UNORDERED && rtx_equal_p (x, y))
> > > >         {
> > > >           prepare_cmp_insn (x, y, UNLT, NULL_RTX, unsignedp, 
> > > > OPTAB_WIDEN,
> > > > -                           ptest, pmode);
> > > > +                           ptest, pmode, optab);
> > > >           if (*ptest)
> > > >             return;
> > > >         }
> > > > @@ -4632,7 +4635,7 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code
> > > comparison, rtx size,
> > > >
> > > >        *pmode = ret_mode;
> > > >        prepare_cmp_insn (x, y, comparison, NULL_RTX, unsignedp, methods,
> > > > -                       ptest, pmode);
> > > > +                       ptest, pmode, optab);
> > > >      }
> > > >
> > > >    return;
> > > > @@ -4816,12 +4819,68 @@ emit_cmp_and_jump_insns (rtx x, rtx y, enum
> > > rtx_code comparison, rtx size,
> > > >       the target supports tbranch.  */
> > > >    machine_mode tmode = mode;
> > > >    direct_optab optab;
> > > > -  if (op1 == CONST0_RTX (GET_MODE (op1))
> > > > -      && validate_test_and_branch (val, &test, &tmode,
> > > > -                                  &optab) != CODE_FOR_nothing)
> > > > +  if (op1 == CONST0_RTX (GET_MODE (op1)))
> > > >      {
> > > > -      emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, true);
> > > > -      return;
> > > > +      if (validate_test_and_branch (val, &test, &tmode,
> > > > +                                   &optab) != CODE_FOR_nothing)
> > > > +       {
> > > > +         emit_cmp_and_jump_insn_1 (test, tmode, label, optab, prob, 
> > > > true);
> > > > +         return;
> > > > +       }
> > > > +
> > > > +      /* If we are comparing equality with 0, check if VAL is another 
> > > > equality
> > > > +        comparison and if the target supports it directly.  */
> > > > +      if (val && TREE_CODE (val) == SSA_NAME
> > > > +         && VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (val))
> > > > +         && (comparison == NE || comparison == EQ))
> > > > +       {
> > > > +         auto def_stmt = SSA_NAME_DEF_STMT (val);
> > > > +         enum insn_code icode;
> > > > +         if (is_gimple_assign (def_stmt)
> > > > +             && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt))
> > > > +                  == tcc_comparison)
> > > > +           {
> > > > +             class expand_operand ops[2];
> > > > +             rtx_insn *tmp = NULL;
> > > > +             start_sequence ();
> > > > +             rtx op0c = expand_normal (gimple_assign_rhs1 (def_stmt));
> > > > +             rtx op1c = expand_normal (gimple_assign_rhs2 (def_stmt));
> > > > +             machine_mode mode2 = GET_MODE (op0c);
> > > > +             create_input_operand (&ops[0], op0c, mode2);
> > > > +             create_input_operand (&ops[1], op1c, mode2);
> > > > +
> > > > +             int unsignedp2 = TYPE_UNSIGNED (TREE_TYPE (val));
> > > > +             auto inner_code = gimple_assign_rhs_code (def_stmt);
> > > > +             rtx test2 = NULL_RTX;
> > > > +
> > > > +             enum rtx_code comparison2 = get_rtx_code (inner_code, 
> > > > unsignedp2);
> > > > +             if (unsignedp2)
> > > > +               comparison2 = unsigned_condition (comparison2);
> > > > +             if (comparison == NE)
> > > > +               optab = vec_cbranch_any_optab;
> > > > +             else
> > > > +               optab = vec_cbranch_all_optab;
> > > > +
> > > > +             if ((icode = optab_handler (optab, mode2))
> > > > +                 != CODE_FOR_nothing
> > > > +                 && maybe_legitimize_operands (icode, 1, 2, ops))
> > > > +               {
> > > > +                 prepare_cmp_insn (ops[0].value, ops[1].value, 
> > > > comparison2,
> > > > +                                   size, unsignedp2, OPTAB_DIRECT, 
> > > > &test2,
> > > > +                                   &mode2, optab);
> > > > +                 emit_cmp_and_jump_insn_1 (test2, mode2, label,
> > > > +                                           optab, prob, false);
> > > > +                 tmp = get_insns ();
> > > > +               }
> > > > +
> > > > +             end_sequence ();
> > > > +             if (tmp)
> > > > +               {
> > > > +                 emit_insn (tmp);
> > > > +                 return;
> > > > +               }
> > > > +           }
> > > > +       }
> > > >      }
> > > >
> > > >    emit_cmp_and_jump_insn_1 (test, mode, label, cbranch_optab, prob,
> false);
> > > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > > index
> > >
> 23f792352388dd5f8de8a1999643179328214abf..a600938fb9e4480ff2d78c9b
> > > 0be344e4856539bb 100644
> > > > --- a/gcc/optabs.def
> > > > +++ b/gcc/optabs.def
> > > > @@ -424,6 +424,8 @@ OPTAB_D (smulhrs_optab, "smulhrs$a3")
> > > >  OPTAB_D (umulhs_optab, "umulhs$a3")
> > > >  OPTAB_D (umulhrs_optab, "umulhrs$a3")
> > > >  OPTAB_D (sdiv_pow2_optab, "sdiv_pow2$a3")
> > > > +OPTAB_D (vec_cbranch_any_optab, "vec_cbranch_any$a4")
> > > > +OPTAB_D (vec_cbranch_all_optab, "vec_cbranch_all$a4")
> > > >  OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
> > > >  OPTAB_D (vec_pack_ssat_optab, "vec_pack_ssat_$a")
> > > >  OPTAB_D (vec_pack_trunc_optab, "vec_pack_trunc_$a")
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH 1/3]middle-end: support vec_cbranch_any and vec_cbranch_all [PR118974]

Reply via email to