RE: [PATCH] vect: Support early break with gswitch statements

Tamar Christina Mon, 22 Sep 2025 08:25:09 -0700

> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Monday, September 22, 2025 2:45 PM
> To: Tamar Christina <[email protected]>
> Cc: Pengfei Li <[email protected]>; [email protected]
> Subject: RE: [PATCH] vect: Support early break with gswitch statements
> 
> On Mon, 22 Sep 2025, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <[email protected]>
> > > Sent: 22 September 2025 12:34
> > > To: Pengfei Li <[email protected]>
> > > Cc: [email protected]; [email protected]; Tamar Christina
> > > <[email protected]>
> > > Subject: Re: [PATCH] vect: Support early break with gswitch statements
> > >
> > > On Thu, Sep 18, 2025 at 2:34 PM Pengfei Li <[email protected]> wrote:
> > > >
> > > > This patch adds vectorization support to early-break loops with gswitch
> > > > statements. Such gswitches may come from original switch-case constructs
> > > > in the source or the iftoswitch pass which rewrites if conditions with a
> > > > chain of comparisons, like below, to gswitch statements.
> > > >
> > > >         if (a[i] == c1 || a[i] == c2 || a[i] == c3)
> > > >
> > > > In this patch, the loop exit condition type is generalized from gcond to
> > > > gimple, so that gswitch statements can be treated as valid loop exits. A
> > > > new vectorizer pattern recognization function is introduced to check if
> > > > all non-default labels of the gswitch have the same destination.
> > >
> > > So the labels should have the same label and the default label should
> > > be the fallthru?  Like the only handled case is
> > >
> > > for (;;)
> > > {
> > > ...
> > >  switch (x)
> > >  {
> > >  case A:
> > >  case B:
> > >  case C:
> > >      /* any code here?  single BB?  */
> > >      goto exit;
> > >  default:;
> > > }
> > > ...
> > > }
> > > exit:
> > >
> > > ?
> > >
> > > > If yes,
> > > > it lowers the gswitch to an equivalent gcond.
> > >
> > > If so I think it might be simpler to implement this in if-conversion?
> > > In fact I think
> > > that already does this?
> >
> > Switch lowering in icvt is really limited.  In fact, so limited it's not 
> > worth extending
> I think.
> > There's a couple of reason this approach was taken:
> >
> > 1. Today switch lowering if icvt can only handle cases where the return is
> invariant. i.e. it can
> > only handle simple data dependencies:
> >
> > int find (int *a, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> >     if (a[i] == 1 || a[i] == 3 || a[i] == 4 || a[i] == 5)
> >       return 1;
> >   }
> >   return -1;
> > }
> >
> > As soon as you want to know the index of the element you found it doesn't 
> > work
> >
> > int find (int *a, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> >     if (a[i] == 1 || a[i] == 3 || a[i] == 4 || a[i] == 5)
> >       return i;
> >   }
> >   return -1;
> > }
> >
> > Because this loop now has multiple exits where the value of I requires a
> reduction,
> > and ifcvt does not handle this:
> >
> > "Can not ifcvt due to multiple exits"
> 
> Yeah, but that's another thing that needs fixing anyway - IIRC we
> currently do not handle regular if-conversion in multi-exit loops
> because of this.


That seems to be logically inconsistent with 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121383#c2
though.  Because if ifcvt could handle multiple exits the above would also be 
handled.

But I agree with you that the way to handle this isn't in ifcvt because ifcvt 
forces a data dependency.

For instance since SVE supports short circuiting compares we don't want to have 
your typical conversion
where both branches are executed and you VEC_COND_EXPR select the lanes at the 
end.  For SVE we
would want to conditionally execute the basic block, and masking will combine 
the results for us.

Ifcvt removes this choice where-as in the vectorizer we can pick based on the 
mode being costed.

The same applies here. We don't want ifcvt for instance to collapse nested 
switches.

> 
> > While perhaps this could be extended this would require ifcvt to select a 
> > main
> exit.
> > This would also limit us in future expansion.
> 
> Why does ifcvt need to select a main edge?

I've not looked too deep into it, but ifcvt seems to want to know which edge to
use conditionalize the statement.

So for

d + c
if (a > b)
  foo

the mask for d+c needs to come from the main exit.  I admit I haven't though 
about
what would happen in the PEELED case. but I have a feeling it would need 
something
different.

> 
> > For instance in this case
> >
> > int find2 (int * restrict a, int *b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> >     switch (a[i])
> >       {
> >         case 1:
> >         case 2:
> >         case 5:
> >             switch (b[i])
> >             {
> >                 case 1:
> >                 case 2:
> >                 case 5:
> >                   return 1;
> >                 default:
> >                 break;
> >             }
> >         default:
> >           break;
> >       }
> >   }
> >   return -1;
> > }
> >
> > Is vectorizable, but ifcvt doesn't handle it.   But once support for && 
> > blocks are
> > added to the vectorizer, we should be able to handle it fine.
> 
> Sure.
> 
> > The reason we support only one destination today is that as you say, the 
> > switch
> is
> > equivalent to a gcond.  This limitation is there because we can't introduce 
> > new
> > control flow to the vectorizer yet. But this is something we should be able 
> > to do
> in
> > the future.
> 
> Sure.
> 
> > I suggested doing this in the vectorizer for two other reason:
> >
> > 1. You've been adamant in the past about not introducing control 
> > dependencies
> into icvt.
> >      Which will limit the usefulness of ifcvt going forward for control 
> > flow.
> > 2. I believe the plan is still to move ifcvt into the vectorizer? If so it 
> > didn't seem
> worth while
> >     Extending the very limited switch lowering in ifcvt.  It just makes 
> > more work to
> more.
> 
> I think it belongs in the vectorizer, yes.  But I don't see this as a
> reason to stop improving it - in fact I think being able to drop it in
> full requires to first code generate vector code without prior copying
> of the scalar (if-converted) loop.
> 

I have a patch stack to do just this.  Hence the email sent a week ago about 
the optabs.
It's one of the big topics at my AArch64 Performance BoF and my goal is to 
start sending
them out in pieces as soon as cauldron is over and I've freed up more time.

But as I'll show at cauldron I'm quite far along with simple to follow 
incremental patches (I hope).
and one reason It took me long is I wanted to make sure the changes would work 
together.

For instance the loops in this patch series won't require a scalar epilogue at 
all.

> > Lastly, in cases as the above where we do an NxM compare, AArch64 has
> instructions for this.
> > Historically ifcvt has only done conditionalization of expression, but not
> instruction selection.
> >
> > Ifcvt would also have a hard time handling cases such as:
> >
> > int find2 (int * restrict a, int *b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> >     switch (a[i])
> >       {
> >         case 1:
> >         case 2:
> >         case 5:
> >           a[i] = i * b[i];
> >           return i;
> >         default:
> >           break;
> >       }
> >   }
> >   return -1;
> > }
> >
> > Which we should be handle in the future.
> 
> I think give we have a switch lowering pass (aka switch-to-if), it
> should be reasonably "easy" to perform switch-to-if during ifcvt
> (on the loop copy for the vectorizer).
> 
> But then first handling multi-exit loops in ifcvt at all would be nice
> (before trying to handle exit-from-switch).
> 

It really does seem like this would be doing work double though.
The improvement to support ifcvt handling of multiple exits with
an invariant return was done in GCC 15, and even then it was immediately
clear the approach was limited.  Given that it never actually solved the issue 
filed.

In fact, with this code, I think we can pretty much drop the code added in
Icvt and still handle more cases in less code.

Teaching ifcvt about multiple exit is teaching it about control flow the 
vectorizer
can in principle handle on its own, but whereas the vectorizer can choose how to
handle it, ifcvt must pick one way early.

If none of this is entirely convincing, maybe lets come back to it after the 
presentations
where I can show what codegen we can/will/should be able to do soon which would
make it somewhat clearer (hopefully) that the vectorizer's masking support is 
more
capable than ifcvt :)

Thanks,
Tamar

> > I agree that it's slightly not nice needing the change in 
> > vectorizable_early_exit and
> we can
> > Hopefully get rid of it in the future. But I believe having this inside the 
> > vectorizer
> will allow
> > us to handle more cases and have a more general solution.
> 
> Note I didn't look into the patch details but tried to understand
> what exactly the proposed limitation is and compare that to the
> existing switch handling in ifcvt.
> 
> Richard.
> 
> >
> > Thanks,
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > To avoid excessive compile
> > > > time, a new parameter vect-max-compare-in-switch-lowering is introduced
> > > > to limit the maximum number of comparisons in the gswitch lowering. In
> > > > addition, the CRC loop optimization is adjusted a bit to make sure CRC
> > > > loops still have gcond exits.
> > > >
> > > > This patch is bootstrapped and regression-tested on x86_64-linux-gnu,
> > > > arm-linux-gnueabihf and aarch64-linux-gnu.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         * doc/invoke.texi: Document vect-max-compare-in-switch-lowering.
> > > >         * gimple-crc-optimization.cc
> > > (crc_optimization::loop_may_calculate_crc):
> > > >         Ensure CRC loop exits are still gconds.
> > > >         (crc_optimization::optimize_crc_loop): Cast CRC loop exits.
> > > >         * params.opt: Add option vect-max-compare-in-switch-lowering.
> > > >         * tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):
> > > >         Generalize loop exit conditions type from gcond to gimple.
> > > >         (get_loop_exit_condition): Likewise.
> > > >         * tree-scalar-evolution.h (get_loop_exit_condition): Likewise.
> > > >         * tree-vect-loop-manip.cc (vect_set_loop_condition_normal):
> > > >         Likewise.
> > > >         (vect_set_loop_condition): Likewise.
> > > >         (slpeel_can_duplicate_loop_p): Likewise.
> > > >         * tree-vect-loop.cc (vect_get_loop_niters): Likewise.
> > > >         (vect_analyze_loop_form): Likewise.
> > > >         (vect_create_loop_vinfo): Likewise.
> > > >         * tree-vect-patterns.cc (vect_recog_gswitch_pattern): New
> > > >         pattern recognition function for gswitch early exits.
> > > >         * tree-vect-slp.cc (vect_analyze_slp): Bail out if a gswitch is
> > > >         not lowered into a gcond.
> > > >         * tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Allow
> > > >         gswitch statements as loop exits.
> > > >         (vectorizable_early_exit): Transform gswitch early exits.
> > > >         * tree-vectorizer.h (struct vect_loop_form_info): Allow gswitch
> > > >         statements as loop exits.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         * gcc.dg/vect/vect-switch-search-line-fast.c: Update the test as
> > > >         the loop is now vectorizable on multiple targets.
> > > >         * gcc.dg/vect/vect-early-break_139-switch1.c: New test.
> > > >         * gcc.dg/vect/vect-early-break_139-switch2.c: New test.
> > > >         * gcc.dg/vect/vect-early-break_139-switch3.c: New test.
> > > >         * gcc.dg/vect/vect-early-break_139-switch4.c: New test.
> > > > ---
> > > >  gcc/doc/invoke.texi                           |   4 +
> > > >  gcc/gimple-crc-optimization.cc                |   9 +-
> > > >  gcc/params.opt                                |   4 +
> > > >  .../vect/vect-early-break_139-switch1.c       |  19 +++
> > > >  .../vect/vect-early-break_139-switch2.c       |  21 ++++
> > > >  .../vect/vect-early-break_139-switch3.c       |  24 ++++
> > > >  .../vect/vect-early-break_139-switch4.c       |  26 ++++
> > > >  .../vect/vect-switch-search-line-fast.c       |   3 +-
> > > >  gcc/tree-scalar-evolution.cc                  |  12 +-
> > > >  gcc/tree-scalar-evolution.h                   |   4 +-
> > > >  gcc/tree-vect-loop-manip.cc                   |   6 +-
> > > >  gcc/tree-vect-loop.cc                         |  10 +-
> > > >  gcc/tree-vect-patterns.cc                     | 111 ++++++++++++++++++
> > > >  gcc/tree-vect-slp.cc                          |  11 +-
> > > >  gcc/tree-vect-stmts.cc                        |  74 ++++++++----
> > > >  gcc/tree-vectorizer.h                         |   8 +-
> > > >  16 files changed, 295 insertions(+), 51 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_139-
> > > switch1.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_139-
> > > switch2.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_139-
> > > switch3.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_139-
> > > switch4.c
> > > >
> > > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > > index 3cf326cc22c..def84a6675a 100644
> > > > --- a/gcc/doc/invoke.texi
> > > > +++ b/gcc/doc/invoke.texi
> > > > @@ -16652,6 +16652,10 @@ Complex expressions slow the analyzer.
> > > >  Maximum number of arguments in a PHI supported by TREE if conversion
> > > >  unless the loop is marked with simd pragma.
> > > >
> > > > +@item vect-max-compare-in-switch-lowering
> > > > +The maximum number of comparisons that can be performed when
> > > lowering switch
> > > > +statements in the vectorizer.
> > > > +
> > > >  @item vect-max-layout-candidates
> > > >  The maximum number of possible vector layouts (such as permutations)
> > > >  to consider when optimizing to-be-vectorized code.
> > > > diff --git a/gcc/gimple-crc-optimization.cc 
> > > > b/gcc/gimple-crc-optimization.cc
> > > > index 1751be9bc97..986aa193893 100644
> > > > --- a/gcc/gimple-crc-optimization.cc
> > > > +++ b/gcc/gimple-crc-optimization.cc
> > > > @@ -937,6 +937,12 @@ crc_optimization::loop_may_calculate_crc (class
> > > loop *loop)
> > > >      return false;
> > > >
> > > >    m_crc_loop = loop;
> > > > +
> > > > +  /* Filter out the loops, which don't have a gcond as single exit.  */
> > > > +  gimple *loop_exit_cond = get_loop_exit_condition (m_crc_loop);
> > > > +  if (loop_exit_cond == NULL || !is_a <gcond *> (loop_exit_cond))
> > > > +    return false;
> > > > +
> > > >    basic_block *loop_bbs = get_loop_body_in_dom_order (m_crc_loop);
> > > >
> > > >    /* Filter out the cases, which don't have exactly two conditions in 
> > > > the loop.
> > > > @@ -1274,7 +1280,8 @@ crc_optimization::optimize_crc_loop (gphi
> > > *output_crc)
> > > >    remove_phi_node (&tmp_gsi, false);
> > > >
> > > >    /* Alter the exit condition of the loop to always exit.  */
> > > > -  gcond* loop_exit_cond = get_loop_exit_condition (m_crc_loop);
> > > > +  gcond *loop_exit_cond
> > > > +    = safe_dyn_cast <gcond *> (get_loop_exit_condition (m_crc_loop));
> > > >    gimple_cond_make_false (loop_exit_cond);
> > > >    update_stmt (loop_exit_cond);
> > > >    return true;
> > > > diff --git a/gcc/params.opt b/gcc/params.opt
> > > > index dd53d830895..115400ee8d2 100644
> > > > --- a/gcc/params.opt
> > > > +++ b/gcc/params.opt
> > > > @@ -1229,6 +1229,10 @@ Whether to use canonical types.
> > > >  Common Joined UInteger Var(param_vect_epilogues_nomask) Init(1)
> > > IntegerRange(0, 1) Param Optimization NoOffload
> > > >  Enable loop epilogue vectorization using smaller vector size.
> > > >
> > > > +-param=vect-max-compare-in-switch-lowering=
> > > > +Common Joined UInteger
> > > Var(param_vect_max_compare_in_switch_lowering) Init(16) Param
> > > Optimization
> > > > +Maximum number of comparisons that can be performed when lowering
> > > switch statements in the vectorizer.
> > > > +
> > > >  -param=vect-max-layout-candidates=
> > > >  Common Joined UInteger Var(param_vect_max_layout_candidates) Init(32)
> > > Param Optimization
> > > >  Maximum number of possible vector layouts (such as permutations) to
> > > consider when optimizing to-be-vectorized code.
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch1.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch1.c
> > > > new file mode 100644
> > > > index 00000000000..bbe75d20116
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch1.c
> > > > @@ -0,0 +1,19 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +int find (int *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < n; i++)
> > > > +  {
> > > > +    /* This is converted to a switch statement in the iftoswitch pass. 
> > > >  */
> > > > +    if (a[i] == 1 || a[i] == 2 || a[i] == 5 || a[i] == 7)
> > > > +      return i;
> > > > +  }
> > > > +  return -1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch2.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch2.c
> > > > new file mode 100644
> > > > index 00000000000..d200bea4594
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch2.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +int find (int *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < n; i++)
> > > > +  {
> > > > +    /* This is converted to a switch statement in the iftoswitch pass. 
> > > >  */
> > > > +    if (a[i] == 1 || a[i] == 3 || a[i] == 4 || a[i] == 5)
> > > > +      continue;
> > > > +    else
> > > > +      return i;
> > > > +  }
> > > > +  return -1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch3.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch3.c
> > > > new file mode 100644
> > > > index 00000000000..f696267a855
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch3.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +int find (int *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < n; i++)
> > > > +  {
> > > > +    /* An early break in case labels of a switch statement.  */
> > > > +    switch (a[i])
> > > > +    {
> > > > +      case -1 ... 1:
> > > > +      case 10:
> > > > +      case 20:
> > > > +        return i;
> > > > +    }
> > > > +  }
> > > > +  return -1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch4.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch4.c
> > > > new file mode 100644
> > > > index 00000000000..f8e3187843d
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_139-switch4.c
> > > > @@ -0,0 +1,26 @@
> > > > +/* { dg-add-options vect_early_break } */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-require-effective-target vect_early_break } */
> > > > +/* { dg-require-effective-target vect_int } */
> > > > +
> > > > +/* { dg-additional-options "-Ofast" } */
> > > > +
> > > > +int find (int *a, int n)
> > > > +{
> > > > +  for (int i = 0; i < n; i++)
> > > > +  {
> > > > +    /* An early break in the default label of a switch statement.  */
> > > > +    switch (a[i])
> > > > +    {
> > > > +      case 1 ... 3:
> > > > +      case 5:
> > > > +      case 10:
> > > > +       continue;
> > > > +      default:
> > > > +        return i;
> > > > +    }
> > > > +  }
> > > > +  return -1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
> > > b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
> > > > index 678512db319..17e80330daa 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
> > > > @@ -16,5 +16,4 @@ const unsigned char *search_line_fast2 (const
> > > unsigned char *s,
> > > >    return s;
> > > >  }
> > > >
> > > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> > > > target {
> > > ilp32 || { amdgcn*-*-* } } } } } */
> > > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { 
> > > > target { !
> > > { ilp32 || { amdgcn*-*-* } } } } } } */
> > > > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } 
> > > > */
> > > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > > > index ecdef7529a6..0a5172c3be4 100644
> > > > --- a/gcc/tree-scalar-evolution.cc
> > > > +++ b/gcc/tree-scalar-evolution.cc
> > > > @@ -1302,7 +1302,7 @@ scev_dfs::follow_ssa_edge_expr (gimple
> > > *at_stmt, tree expr,
> > > >     guards the exit edge.  If the expression is too difficult to
> > > >     analyze, then give up.  */
> > > >
> > > > -gcond *
> > > > +gimple *
> > > >  get_loop_exit_condition (const class loop *loop)
> > > >  {
> > > >    return get_loop_exit_condition (single_exit (loop));
> > > > @@ -1311,16 +1311,20 @@ get_loop_exit_condition (const class loop
> > > *loop)
> > > >  /* If the statement just before the EXIT_EDGE contains a condition then
> > > >     return the condition, otherwise NULL. */
> > > >
> > > > -gcond *
> > > > +gimple *
> > > >  get_loop_exit_condition (const_edge exit_edge)
> > > >  {
> > > > -  gcond *res = NULL;
> > > > +  gimple *res = NULL;
> > > >
> > > >    if (dump_file && (dump_flags & TDF_SCEV))
> > > >      fprintf (dump_file, "(get_loop_exit_condition \n  ");
> > > >
> > > >    if (exit_edge)
> > > > -    res = safe_dyn_cast <gcond *> (*gsi_last_bb (exit_edge->src));
> > > > +    {
> > > > +      gimple *g = *gsi_last_bb (exit_edge->src);
> > > > +      if (is_a <gcond *> (g) || is_a <gswitch *> (g))
> > > > +       res = g;
> > > > +    }
> > > >
> > > >    if (dump_file && (dump_flags & TDF_SCEV))
> > > >      {
> > > > diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> > > > index c7feeacf1db..54f0bf82c7b 100644
> > > > --- a/gcc/tree-scalar-evolution.h
> > > > +++ b/gcc/tree-scalar-evolution.h
> > > > @@ -22,8 +22,8 @@ along with GCC; see the file COPYING3.  If not see
> > > >  #define GCC_TREE_SCALAR_EVOLUTION_H
> > > >
> > > >  extern tree number_of_latch_executions (class loop *);
> > > > -extern gcond *get_loop_exit_condition (const class loop *);
> > > > -extern gcond *get_loop_exit_condition (const_edge);
> > > > +extern gimple *get_loop_exit_condition (const class loop *);
> > > > +extern gimple *get_loop_exit_condition (const_edge);
> > > >
> > > >  extern void scev_initialize (void);
> > > >  extern bool scev_initialized_p (void);
> > > > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> > > > index 20141dbc2e5..bd04252da2d 100644
> > > > --- a/gcc/tree-vect-loop-manip.cc
> > > > +++ b/gcc/tree-vect-loop-manip.cc
> > > > @@ -1233,7 +1233,7 @@ vect_set_loop_condition_normal (loop_vec_info
> > > /* loop_vinfo */, edge exit_edge,
> > > >  {
> > > >    tree indx_before_incr, indx_after_incr;
> > > >    gcond *cond_stmt;
> > > > -  gcond *orig_cond;
> > > > +  gimple *orig_cond;
> > > >    edge pe = loop_preheader_edge (loop);
> > > >    gimple_stmt_iterator incr_gsi;
> > > >    bool insert_after;
> > > > @@ -1395,7 +1395,7 @@ vect_set_loop_condition (class loop *loop, edge
> > > loop_e, loop_vec_info loop_vinfo
> > > >                          bool niters_maybe_zero)
> > > >  {
> > > >    gcond *cond_stmt;
> > > > -  gcond *orig_cond = get_loop_exit_condition (loop_e);
> > > > +  gimple *orig_cond = get_loop_exit_condition (loop_e);
> > > >    gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond);
> > > >
> > > >    if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))
> > > > @@ -2039,7 +2039,7 @@ slpeel_can_duplicate_loop_p (const class loop
> > > *loop, const_edge exit_e,
> > > >                              const_edge e)
> > > >  {
> > > >    edge entry_e = loop_preheader_edge (loop);
> > > > -  gcond *orig_cond = get_loop_exit_condition (exit_e);
> > > > +  gimple *orig_cond = get_loop_exit_condition (exit_e);
> > > >    gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
> > > >
> > > >    /* All loops have an outer scope; the only case loop->outer is NULL 
> > > > is for
> > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > > > index b58e4355e58..02db9f5a2eb 100644
> > > > --- a/gcc/tree-vect-loop.cc
> > > > +++ b/gcc/tree-vect-loop.cc
> > > > @@ -627,12 +627,12 @@ vect_fixup_scalar_cycles_with_patterns
> > > (loop_vec_info loop_vinfo)
> > > >     Return the loop exit conditions.  */
> > > >
> > > >
> > > > -static vec<gcond *>
> > > > +static vec<gimple *>
> > > >  vect_get_loop_niters (class loop *loop, const_edge main_exit, tree
> > > *assumptions,
> > > >                       tree *number_of_iterations, tree 
> > > > *number_of_iterationsm1)
> > > >  {
> > > >    auto_vec<edge> exits = get_loop_exit_edges (loop);
> > > > -  vec<gcond *> conds;
> > > > +  vec<gimple *> conds;
> > > >    conds.create (exits.length ());
> > > >    class tree_niter_desc niter_desc;
> > > >    tree niter_assumptions, niter, may_be_zero;
> > > > @@ -654,7 +654,7 @@ vect_get_loop_niters (class loop *loop, const_edge
> > > main_exit, tree *assumptions,
> > > >    unsigned int i;
> > > >    FOR_EACH_VEC_ELT (exits, i, exit)
> > > >      {
> > > > -      gcond *cond = get_loop_exit_condition (exit);
> > > > +      gimple *cond = get_loop_exit_condition (exit);
> > > >        if (cond)
> > > >         conds.safe_push (cond);
> > > >
> > > > @@ -1680,7 +1680,7 @@ vect_analyze_loop_form (class loop *loop,
> > > gimple *loop_vectorized_call,
> > > >    /* Determine what the primary and alternate exit conds are.  */
> > > >    for (unsigned i = 0; i < info->conds.length (); i++)
> > > >      {
> > > > -      gcond *cond = info->conds[i];
> > > > +      gimple *cond = info->conds[i];
> > > >        if (exit_e->src == gimple_bb (cond))
> > > >         std::swap (info->conds[0], info->conds[i]);
> > > >      }
> > > > @@ -1745,7 +1745,7 @@ vect_create_loop_vinfo (class loop *loop,
> > > vec_info_shared *shared,
> > > >    if (!integer_onep (info->assumptions) && !orig_loop_info)
> > > >      LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
> > > >
> > > > -  for (gcond *cond : info->conds)
> > > > +  for (gimple *cond : info->conds)
> > > >      {
> > > >        stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond);
> > > >        /* Mark the statement as a condition.  */
> > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > > index 70bf768d339..cd39bdfbd0d 100644
> > > > --- a/gcc/tree-vect-patterns.cc
> > > > +++ b/gcc/tree-vect-patterns.cc
> > > > @@ -5529,6 +5529,116 @@ vect_recog_gcond_pattern (vec_info *vinfo,
> > > >    return pattern_stmt;
> > > >  }
> > > >
> > > > +/* Function vect_recog_gswitch_pattern
> > > > +
> > > > +   Try to find pattern like following:
> > > > +
> > > > +     switch (x) <default: <La>, case b1: <Lb>, ... case bN: <Lb>>
> > > > +
> > > > +   where all labels except the default one have the same destination 
> > > > and
> > > > +   convert it to a boolean pattern
> > > > +
> > > > +     mask = (x == b1) || (x == b2) || ... || (x == bN)
> > > > +     if (mask != 0)
> > > > +
> > > > +   The type of output TYPE_OUT is set in this function.
> > > > +
> > > > +   Input:
> > > > +
> > > > +   * STMT_VINFO: The stmt at the end from which the pattern
> > > > +                search begins, i.e., cast of a bool to
> > > > +                an integer type.
> > > > +
> > > > +   Output:
> > > > +
> > > > +   * TYPE_OUT: The type of the output of this pattern.
> > > > +
> > > > +   * Return value: A new stmt that will be used to replace the 
> > > > pattern.  */
> > > > +
> > > > +static gimple *
> > > > +vect_recog_gswitch_pattern (vec_info *vinfo,
> > > > +                           stmt_vec_info stmt_vinfo, tree *type_out)
> > > > +{
> > > > +  /* Currently we only support this for loop vectorization and when 
> > > > multiple
> > > > +     exits.  */
> > > > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > > > +  if (!loop_vinfo || !LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
> > > > +    return NULL;
> > > > +
> > > > +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > > > +  gswitch *gs = NULL;
> > > > +  if (!(gs = dyn_cast <gswitch *> (last_stmt)))
> > > > +    return NULL;
> > > > +
> > > > +  tree index = gimple_switch_index (gs);
> > > > +  tree stype = TREE_TYPE (index);
> > > > +
> > > > +  /* For now, there should be no gswitch with boolean type index.  */
> > > > +  gcc_assert (!VECT_SCALAR_BOOLEAN_TYPE_P (stype));
> > > > +  tree vectype = get_mask_type_for_scalar_type (vinfo, stype);
> > > > +  if (vectype == NULL_TREE)
> > > > +    return NULL;
> > > > +
> > > > +  tree case_label_dest = NULL_TREE;
> > > > +  tree cond = NULL_TREE;
> > > > +  unsigned int num_cmps = 0;
> > > > +  for (unsigned li = 1; li < gimple_switch_num_labels (gs); li++)
> > > > +    {
> > > > +      tree label = gimple_switch_label (gs, li);
> > > > +
> > > > +      /* Check if all case labels have the same destination.  */
> > > > +      tree dest = CASE_LABEL (label);
> > > > +      if (case_label_dest == NULL)
> > > > +       case_label_dest = dest;
> > > > +      else if (dest != case_label_dest)
> > > > +       return NULL;
> > > > +
> > > > +      tree low = CASE_LOW (label);
> > > > +      gcc_assert (low != NULL_TREE);
> > > > +      tree high = CASE_HIGH (label);
> > > > +      gcc_assert (high == NULL_TREE || tree_int_cst_lt (low, high));
> > > > +
> > > > +      wide_int val = wi::to_wide (low);
> > > > +      wide_int one = wi::one (TYPE_PRECISION (stype));
> > > > +      while (true)
> > > > +       {
> > > > +         /* A large number of comparisons lead to long compile time 
> > > > and make
> > > > +            vectorization non-profitable (most cost models will reject 
> > > > the
> > > > +            vectorized code).  So bail out in this case.  */
> > > > +         if (++num_cmps > (unsigned)
> > > param_vect_max_compare_in_switch_lowering)
> > > > +           return NULL;
> > > > +
> > > > +         tree eq_cond = vect_recog_temp_ssa_var (boolean_type_node, 
> > > > NULL);
> > > > +         gimple *eq_stmt = gimple_build_assign (eq_cond, EQ_EXPR, 
> > > > index,
> > > > +                                                wide_int_to_tree 
> > > > (stype, val));
> > > > +         append_pattern_def_seq (vinfo, stmt_vinfo, eq_stmt, vectype, 
> > > > stype);
> > > > +         if (cond == NULL_TREE)
> > > > +           cond = eq_cond;
> > > > +         else
> > > > +           {
> > > > +             tree new_cond = vect_recog_temp_ssa_var 
> > > > (boolean_type_node,
> > > NULL);
> > > > +             gimple *or_stmt = gimple_build_assign (new_cond, 
> > > > BIT_IOR_EXPR,
> > > > +                                                    cond, eq_cond);
> > > > +             append_pattern_def_seq (vinfo, stmt_vinfo, or_stmt,
> > > > +                                     vectype, stype);
> > > > +             cond = new_cond;
> > > > +           }
> > > > +
> > > > +         /* Break if the case label has a single value or we've 
> > > > reached the
> > > > +            high value.  */
> > > > +         if (high == NULL_TREE || wi::eq_p (val, wi::to_wide (high)))
> > > > +           break;
> > > > +
> > > > +         val = wi::add (val, one);
> > > > +       }
> > > > +    }
> > > > +
> > > > +  vect_pattern_detected ("vect_recog_gswitch_pattern", last_stmt);
> > > > +  *type_out = vectype;
> > > > +  return gimple_build_cond (NE_EXPR, cond, build_zero_cst (TREE_TYPE
> > > (cond)),
> > > > +                           NULL_TREE, NULL_TREE);
> > > > +}
> > > > +
> > > >  /* Function vect_recog_bool_pattern
> > > >
> > > >     Try to find pattern like following:
> > > > @@ -6987,6 +7097,7 @@ static vect_recog_func
> > > vect_vect_recog_func_ptrs[] = {
> > > >    { vect_recog_sat_sub_pattern, "sat_sub" },
> > > >    { vect_recog_sat_trunc_pattern, "sat_trunc" },
> > > >    { vect_recog_gcond_pattern, "gcond" },
> > > > +  { vect_recog_gswitch_pattern, "gswitch" },
> > > >    { vect_recog_bool_pattern, "bool" },
> > > >    /* This must come before mask conversion, and includes the parts
> > > >       of mask conversion that are needed for gather and scatter
> > > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > > index 895fb88ab7f..217d1299656 100644
> > > > --- a/gcc/tree-vect-slp.cc
> > > > +++ b/gcc/tree-vect-slp.cc
> > > > @@ -5409,14 +5409,13 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > > max_tree_size,
> > > >           vec<stmt_vec_info> roots = vNULL;
> > > >           roots.safe_push (cond_info);
> > > >           gimple *stmt = STMT_VINFO_STMT (cond_info);
> > > > -         tree args0 = gimple_cond_lhs (stmt);
> > > > -         tree args1 = gimple_cond_rhs (stmt);
> > > >
> > > >           /* These should be enforced by cond lowering, but if it failed
> > > >              bail.  */
> > > > -         if (gimple_cond_code (stmt) != NE_EXPR
> > > > -             || TREE_TYPE (args0) != boolean_type_node
> > > > -             || !integer_zerop (args1))
> > > > +         if (!is_a <gcond *> (stmt)
> > > > +             || gimple_cond_code (stmt) != NE_EXPR
> > > > +             || TREE_TYPE (gimple_cond_lhs (stmt)) != boolean_type_node
> > > > +             || !integer_zerop (gimple_cond_rhs (stmt)))
> > > >             {
> > > >               roots.release ();
> > > >               return opt_result::failure_at (vect_location,
> > > > @@ -5428,7 +5427,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > > max_tree_size,
> > > >              from them.  It's highly likely that the resulting SLP tree 
> > > > here if both
> > > >              arguments have a def will be incompatible, but we rely on 
> > > > it being split
> > > >              later on.  */
> > > > -         auto varg = loop_vinfo->lookup_def (args0);
> > > > +         auto varg = loop_vinfo->lookup_def (gimple_cond_lhs (stmt));
> > > >           vec<stmt_vec_info> stmts;
> > > >           vec<tree> remain = vNULL;
> > > >           stmts.create (1);
> > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > index 6274956e2a5..436a4f4fc44 100644
> > > > --- a/gcc/tree-vect-stmts.cc
> > > > +++ b/gcc/tree-vect-stmts.cc
> > > > @@ -743,6 +743,7 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> > > loop_vinfo, bool *fatal)
> > > >
> > > >           if (gimple_get_lhs (stmt) == NULL_TREE
> > > >               && !is_a <gcond *> (stmt)
> > > > +             && !is_a <gswitch *> (stmt)
> > > >               && !is_a <gcall *> (stmt))
> > > >             return opt_result::failure_at
> > > >                 (stmt, "not vectorized: irregular stmt: %G", stmt);
> > > > @@ -12339,28 +12340,6 @@ vectorizable_early_exit (loop_vec_info
> > > loop_vinfo, stmt_vec_info stmt_info,
> > > >    bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > > >    bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> > > >
> > > > -  /* Now build the new conditional.  Pattern gimple_conds get dropped
> > > during
> > > > -     codegen so we must replace the original insn.  */
> > > > -  gimple *orig_stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > > -  gcond *cond_stmt = as_a <gcond *>(orig_stmt);
> > > > -
> > > > -  tree vectype_out = vectype;
> > > > -  auto bb = gimple_bb (cond_stmt);
> > > > -  edge exit_true_edge = EDGE_SUCC (bb, 0);
> > > > -  if (exit_true_edge->flags & EDGE_FALSE_VALUE)
> > > > -    exit_true_edge = EDGE_SUCC (bb, 1);
> > > > -  gcc_assert (exit_true_edge->flags & EDGE_TRUE_VALUE);
> > > > -
> > > > -  /* When vectorizing we assume that if the branch edge is taken that 
> > > > we're
> > > > -     exiting the loop.  This is not however always the case as the 
> > > > compiler will
> > > > -     rewrite conditions to always be a comparison against 0.  To do 
> > > > this it
> > > > -     sometimes flips the edges.  This is fine for scalar,  but for 
> > > > vector we
> > > > -     then have to negate the result of the test, as we're still 
> > > > assuming that if
> > > > -     you take the branch edge that we found the exit condition.  i.e. 
> > > > we need
> > > to
> > > > -     know whether we are generating a `forall` or an `exist` 
> > > > condition.  */
> > > > -  bool flipped = flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > > > -                                       exit_true_edge->dest);
> > > > -
> > > >    /* See if we support ADDHN and use that for the reduction.  */
> > > >    internal_fn ifn = IFN_VEC_TRUNC_ADD_HIGH;
> > > >    bool addhn_supported_p
> > > > @@ -12439,6 +12418,53 @@ vectorizable_early_exit (loop_vec_info
> > > loop_vinfo, stmt_vec_info stmt_info,
> > > >    auto_vec<tree> stmts;
> > > >    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > > >
> > > > +  /* Now build the new conditional.  Pattern gimple_conds get dropped
> > > during
> > > > +     codegen so we must replace the original insn.  */
> > > > +  gimple *orig_stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > > > +  tree vectype_out = vectype;
> > > > +  auto bb = gimple_bb (orig_stmt);
> > > > +
> > > > +  /* Find the edge which indicates the branch is taken.  If original 
> > > > statement
> > > > +     is a gcond, we find the edge with flag EDGE_TRUE_VALUE.  For 
> > > > gswitch,
> > > we
> > > > +     find the edge from the non-default label, and set flags
> > > EDGE_TRUE_VALUE,
> > > > +     EDGE_FALSE_VALUE for the two edges.  */
> > > > +  edge exit_true_edge = EDGE_SUCC (bb, 0);
> > > > +  if (is_a <gcond *> (orig_stmt))
> > > > +    {
> > > > +      if (exit_true_edge->flags & EDGE_FALSE_VALUE)
> > > > +       exit_true_edge = EDGE_SUCC (bb, 1);
> > > > +      gcc_assert (exit_true_edge->flags & EDGE_TRUE_VALUE);
> > > > +    }
> > > > +  else
> > > > +    {
> > > > +      gcc_assert (is_a <gswitch *> (orig_stmt));
> > > > +      gcc_assert (bb->succs->length () == 2);
> > > > +      tree dft_lab = gimple_switch_default_label (as_a <gswitch *>
> > > (orig_stmt));
> > > > +      basic_block dft_bb = label_to_block (cfun, CASE_LABEL (dft_lab));
> > > > +      if (exit_true_edge->dest != dft_bb)
> > > > +       {
> > > > +         EDGE_SUCC (bb, 0)->flags |= EDGE_TRUE_VALUE;
> > > > +         EDGE_SUCC (bb, 1)->flags |= EDGE_FALSE_VALUE;
> > > > +       }
> > > > +      else
> > > > +       {
> > > > +         exit_true_edge = EDGE_SUCC (bb, 1);
> > > > +         EDGE_SUCC (bb, 0)->flags |= EDGE_FALSE_VALUE;
> > > > +         EDGE_SUCC (bb, 1)->flags |= EDGE_TRUE_VALUE;
> > > > +       }
> > > > +      gcc_assert (exit_true_edge->dest != dft_bb);
> > > > +    }
> > > > +
> > > > +  /* When vectorizing we assume that if the branch edge is taken that 
> > > > we're
> > > > +     exiting the loop.  This is not however always the case as the 
> > > > compiler will
> > > > +     rewrite conditions to always be a comparison against 0.  To do 
> > > > this it
> > > > +     sometimes flips the edges.  This is fine for scalar,  but for 
> > > > vector we
> > > > +     then have to negate the result of the test, as we're still 
> > > > assuming that if
> > > > +     you take the branch edge that we found the exit condition.  i.e., 
> > > > we need
> > > > +     to know whether we are generating a `forall` or an `exist` 
> > > > condition.  */
> > > > +  bool flipped = flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > > > +                                       exit_true_edge->dest);
> > > > +
> > > >    /* If we're comparing against a previous forall we need to negate the
> > > resullts
> > > >       before we do the final comparison or reduction.  */
> > > >    if (flipped)
> > > > @@ -12536,8 +12562,8 @@ vectorizable_early_exit (loop_vec_info
> > > loop_vinfo, stmt_vec_info stmt_info,
> > > >    gcc_assert (new_temp);
> > > >
> > > >    tree cst = build_zero_cst (vectype_out);
> > > > -  gimple_cond_set_condition (cond_stmt, NE_EXPR, new_temp, cst);
> > > > -  update_stmt (orig_stmt);
> > > > +  new_stmt = gimple_build_cond (NE_EXPR, new_temp, cst, NULL_TREE,
> > > NULL_TREE);
> > > > +  gsi_replace (&cond_gsi, new_stmt, true);
> > > >
> > > >    /* ??? */
> > > >    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> > > > index b7c2188ab3d..cb715459c00 100644
> > > > --- a/gcc/tree-vectorizer.h
> > > > +++ b/gcc/tree-vectorizer.h
> > > > @@ -1155,10 +1155,10 @@ public:
> > > >    bool early_breaks;
> > > >
> > > >    /* List of loop additional IV conditionals found in the loop.  */
> > > > -  auto_vec<gcond *> conds;
> > > > +  auto_vec<gimple *> conds;
> > > >
> > > >    /* Main loop IV cond.  */
> > > > -  gcond* loop_iv_cond;
> > > > +  gimple *loop_iv_cond;
> > > >
> > > >    /* True if we have an unroll factor requested by the user through 
> > > > pragma
> > > GCC
> > > >       unroll.  */
> > > > @@ -2702,8 +2702,8 @@ struct vect_loop_form_info
> > > >    tree number_of_iterations;
> > > >    tree number_of_iterationsm1;
> > > >    tree assumptions;
> > > > -  auto_vec<gcond *> conds;
> > > > -  gcond *inner_loop_cond;
> > > > +  auto_vec<gimple *> conds;
> > > > +  gimple *inner_loop_cond;
> > > >    edge loop_exit;
> > > >  };
> > > >  extern opt_result vect_analyze_loop_form (class loop *, gimple *,
> > > > --
> > > > 2.43.0
> > > >
> >
> 
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] vect: Support early break with gswitch statements

Reply via email to