Re: [PATCH]middle-end: support SLP early break

Richard Biener Wed, 02 Oct 2024 05:50:08 -0700

On Tue, 1 Oct 2024, Tamar Christina wrote:

> Hi all,
> 
> This patch introduces feature parity for early break int the SLP only
> vectorizer.
> 
> The approach taken here is to treat the early exits as root statements for an
> SLP tree.  This means that we don't need any changes to build_slp to support
> gconds.
> 
> Codegen for the gcond itself now has to be done out of line but the body of 
> the
> SLP blocks itself is simply driven by SLP scheduling.  There is a slight
> awkwardness in having re-used vectorizable_early_exit for both SLP and non-SLP
> but I've documented the differences and when I did try to refactor it it 
> wasn't
> really worth it given that this is a temporary state anyway.
> 
> This version is restricted to lane = 1, as such we can re-use the existing
> move_early_break function instead of having to do safety update through
> scheduling.  I have a branch where I'm working on that but lane > 1 is out of
> scope for GCC 15 anyway.   The only reason I will try to get moving through
> scheduling done as a stretch goal is so we get epilogue vectorization back for
> early break.
> 
> The example:
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i]*2 != x)
>      break;
>    vect_a[i] = x;
>    
>  }
>  return ret;
> }
> 
> builds the following SLP instance for early break:
> 
> note:   Analyzing vectorizable control flow: if (patt_6 != 0)
> note:   Starting SLP discovery for
> note:     patt_6 = _4 != x_9(D);
> note:   starting SLP discovery for node 0x63abc80
> note:   Build SLP for patt_6 = _4 != x_9(D);
> note:   precomputed vectype: vector(4) <signed-boolean:32>
> note:   nunits = 4
> note:   vect_is_simple_use: operand x_9(D), type of def: external
> note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
> +INF] MASK 0xffff
>         _3 * 2, type of def: internal
> note:   starting SLP discovery for node 0x63abdc0
> note:   Build SLP for _4 = _3 * 2;
> note:   precomputed vectype: vector(4) unsigned int
> note:   nunits = 4
> note:   vect_is_simple_use: operand #
>         vect_aD.4416[i_15], type of def: internal
> note:   vect_is_simple_use: operand 2, type of def: constant
> note:   starting SLP discovery for node 0x63abe60
> note:   Build SLP for _3 = vect_a[i_15];
> note:   precomputed vectype: vector(4) unsigned int
> note:   nunits = 4
> note:   SLP discovery for node 0x63abe60 succeeded
> note:   SLP discovery for node 0x63abdc0 succeeded
> note:   SLP discovery for node 0x63abc80 succeeded
> note:   SLP size 3 vs. limit 10.
> note:   Final SLP tree for instance 0x6474190:
> note:   node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) <signed-boolean:32>
> note:   op template: patt_6 = _4 != x_9(D);
> note:         stmt 0 patt_6 = _4 != x_9(D);
> note:         children 0x63abd20 0x63abdc0
> note:   node (external) 0x63abd20 (max_nunits=1, refcnt=1)
> note:         { x_9(D) }
> note:   node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int
> note:   op template: _4 = _3 * 2;
> note:         stmt 0 _4 = _3 * 2;
> note:         children 0x63abe60 0x63abf00
> note:   node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int
> note:   op template: _3 = vect_a[i_15];
> note:         stmt 0 _3 = vect_a[i_15];
> note:         load permutation { 0 }
> note:   node (constant) 0x63abf00 (max_nunits=1, refcnt=1)
> note:         { 2 }
> 
> and during codegen:
> 
> note:   ------>vectorizing SLP node starting from: patt_6 = _4 != x_9(D);
> note:   vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, 
> +INF] MASK 0xffff
>         _3 * 2, type of def: internal
> note:   add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57;
> note:    === vectorizable_early_exit ===
> note:    transform early-exit.
> note:   vectorizing stmts using SLP.
> note:   Vectorizing SLP tree:
> note:   node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int
> note:   op template: i_12 = i_15 + 1;
> note:         stmt 0 i_12 = i_15 + 1;
> note:         children 0x63aba00 0x63ac040
> note:   node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int
> note:   op template: i_15 = PHI <i_12(6), 0(14)>
> note:         [l] stmt 0 i_15 = PHI <i_12(6), 0(14)>
> note:         children (nil) (nil)
> note:   node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int
> note:         { 1 }
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Also bootstrapped --with-build-config='bootstrap-O3 bootstrap-lto'
> --enable-checking=release,yes,rtl,extra on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?
> 
> gcc/ChangeLog:
> 
>       * tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond.
>       (LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS): New.
>       (vectorizable_early_exit): Expose.
>       (class _loop_vec_info): Add early_break_live_stmts.
>       * tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance):
>       Support gcond instances.
>       (vect_analyze_slp): Analyze gcond roots and early break live statements.
>       (maybe_push_to_hybrid_worklist): Don't sink gconds.
>       (vect_slp_analyze_node_operations): Support gconds.
>       (vect_slp_check_for_roots): Update comments.
>       (vectorize_slp_instance_root_stmt): Support gconds.
>       (vect_schedule_slp): Pass vinfo to vectorize_slp_instance_root_stmt.
>       * tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break live
>       statements.
>       (vectorizable_early_exit): Support SLP.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/vect-early-break_126.c: New test.
>       * gcc.dg/vect/vect-early-break_127.c: New test.
>       * gcc.dg/vect/vect-early-break_128.c: New test.
> 
> ---
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..4bfc9880f9fc869bf616123ff509d13be17ffacf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_126.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
> +
> +#define N 1024
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> + 
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i] > x)
> +     {
> +       ret *= vect_a[i];
> +       return vect_a[i];
> +     }
> +   vect_a[i] = x;
> +   ret += vect_a[i] + vect_b[i];
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..67cb5d34a77192e5d7d72c35df8e83535ef184ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_127.c
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
> +
> +#ifndef N
> +#define N 800
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i++)
> + {
> +   vect_b[i] = x + i;
> +   if (vect_a[i]*2 != x)
> +     break;
> +   vect_a[i] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..6d7fb920ec2de529a4aa1de2c4a04286989204fd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_128.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +
> +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
> +
> +#ifndef N
> +#define N 800
> +#endif
> +unsigned vect_a[N];
> +unsigned vect_b[N];
> +  
> +unsigned test4(unsigned x)
> +{
> + unsigned ret = 0;
> + for (int i = 0; i < N; i+=2)
> + {
> +   vect_b[i] = x + i;
> +   vect_b[i+1] = x + i+1;
> +   if (vect_a[i]*2 != x)
> +     break;
> +   if (vect_a[i+1]*2 != x)
> +     break;
> +   vect_a[i] = x;
> +   vect_a[i+1] = x;
> +   
> + }
> + return ret;
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 
> 600987dd6e5d506aa5fbb02350f9dab77793d382..7e765df466a59249feb999c24d8f2dad232948ae
>  100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -3697,6 +3697,13 @@ vect_build_slp_instance (vec_info *vinfo,
>                        "Analyzing vectorizable constructor: %G\n",
>                        root_stmt_infos[0]->stmt);
>      }
> +  else if (kind == slp_inst_kind_gcond)
> +    {
> +      if (dump_enabled_p ())
> +     dump_printf_loc (MSG_NOTE, vect_location,
> +                      "Analyzing vectorizable control flow: %G",
> +                      root_stmt_infos[0]->stmt);
> +    }
>  
>    if (dump_enabled_p ())
>      {
> @@ -4143,6 +4150,12 @@ vect_analyze_slp_instance (vec_info *vinfo,
>        STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info))
>       = STMT_VINFO_REDUC_DEF (vect_orig_stmt (scalar_stmts.last ()));
>      }
> +  else if (kind == slp_inst_kind_gcond)
> +    {
> +      /* Collect the stores and store them in scalar_stmts.  */
> +      scalar_stmts.create (1);
> +      scalar_stmts.quick_push (vect_stmt_to_vectorize (next_info));
> +    }


We have this "left-over" dual API but since you never call
vect_analyze_slp_instance with slp_inst_kind_gcond but use
vect_build_slp_instance I think you can drop this hunk.

>    else
>      gcc_unreachable ();
>  
> @@ -4742,6 +4755,56 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
> max_tree_size,
>                                        bst_map, NULL, force_single_lane);
>             }
>         }
> +
> +      /* Find SLP sequences starting from gconds.  */
> +      for (auto cond : LOOP_VINFO_LOOP_CONDS (loop_vinfo))
> +     {
> +       auto cond_info = loop_vinfo->lookup_stmt (cond);
> +       vec<stmt_vec_info> stmts;
> +       vec<stmt_vec_info> roots = vNULL;
> +       vec<tree> remain = vNULL;
> +
> +       cond_info = vect_stmt_to_vectorize (cond_info);
> +       roots.safe_push (cond_info);
> +       stmts.create (2);
> +       tree args0 = gimple_cond_lhs (STMT_VINFO_STMT (cond_info));
> +       tree args1 = gimple_cond_rhs (STMT_VINFO_STMT (cond_info));
> +       /* An argument without a loop def will be codegened from vectorizing 
> the
> +          root gcond itself.  As such we don't need to try to build an SLP 
> tree
> +          from them.  It's highly likely that the resulting SLP tree here if 
> both
> +          arguments have a def will be incompatible, but we rely on it being 
> split
> +          later on.  */
> +       if (auto varg = loop_vinfo->lookup_def (args0))
> +         stmts.quick_push (vect_stmt_to_vectorize (varg));
> +
> +       if (auto varg = loop_vinfo->lookup_def (args1))
> +         stmts.quick_push (vect_stmt_to_vectorize (varg));

Hmm - you pattern-replace

     if (a != b)

with

     patt = a != b;
     if (patt != 0)

correct?  So I don't get why you push two scalar stmts in 'stmts'
and not just the pattern def of args0?  Plus assert the condition
is NE_EXPR and the cond_rhs zero?  (or simply guard discovery with
that - other gconds will simply not be vectorized)

That said, when loop_vinfo->lookup_def (args1) is not NULL
what you ask for doesn't really make sense - even if it discovers
OK (you get vectors with the condition lhs and rhs interleaved).

> +       if (!stmts.is_empty ())
> +         vect_build_slp_instance (vinfo, slp_inst_kind_gcond,
> +                                  stmts, roots, remain,
> +                                  max_tree_size, &limit,
> +                                  bst_map, NULL, force_single_lane);
> +     }
> +
> +     /* Find and create slp instances for inductions that have been forced
> +        live due to early break.  */
> +     edge latch_e = loop_latch_edge (LOOP_VINFO_LOOP (loop_vinfo));
> +     for (auto stmt_info : LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS (loop_vinfo))
> +       {
> +         vec<stmt_vec_info> stmts;
> +         vec<stmt_vec_info> roots = vNULL;
> +         vec<tree> remain = vNULL;
> +         gphi *lc_phi = as_a<gphi *> (STMT_VINFO_STMT (stmt_info));
> +         tree def = gimple_phi_arg_def_from_edge (lc_phi, latch_e);

Hmm, so this isn't the LC PHI but the header PHI!  So maybe better named
LOOP_VINFO_EARLY_BREAKS_LIVE_IVS?

Note we're now walking all actual LC PHIs, but only when marked as
live.  So maybe you can instead of adding to 
LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS mark the defs appropriately?
Just a suggestion - I think what you do works as well.

> +         stmt_vec_info lc_info = loop_vinfo->lookup_def (def);
> +         stmts.create (1);
> +         stmts.quick_push (vect_stmt_to_vectorize (lc_info));
> +         vect_build_slp_instance (vinfo, slp_inst_kind_reduc_group,
> +                                  stmts, roots, remain,
> +                                  max_tree_size, &limit,
> +                                  bst_map, NULL, force_single_lane);
> +       }
>      }
>  
>    hash_set<slp_tree> visited_patterns;
> @@ -7157,8 +7220,9 @@ maybe_push_to_hybrid_worklist (vec_info *vinfo,
>           }
>       }
>      }
> -  /* No def means this is a loo_vect sink.  */
> -  if (!any_def)
> +  /* No def means this is a loop_vect sink.  Gimple conditionals also don't 
> have a
> +     def but shouldn't be considered sinks.  */
> +  if (!any_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
>      {
>        if (dump_enabled_p ())
>       dump_printf_loc (MSG_NOTE, vect_location,
> @@ -7542,9 +7606,27 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
> slp_tree node,
>      return true;
>    visited_vec.safe_push (node);
>  
> +  /* If early break also check the root statement as we need to both analyze
> +     and trigger codegen for it.  The analysis will check whether can 
> actually
> +     vectorize it.  At the memoment splitting off the analsysi bit from 
> inside
> +     it duplicates a lot of the setup code so it's not worth while to do so.
> +     However when either the non-SLP loop vect goes away or we split 
> vectorizable_*
> +     functions then we can call the analysis only part from here instead.  */
>    bool res = true;
> -  unsigned visited_rec_start = visited_vec.length ();
>    unsigned cost_vec_rec_start = cost_vec->length ();
> +  if (SLP_INSTANCE_KIND (node_instance) == slp_inst_kind_gcond)
> +    {
> +      auto root_stmt_info = SLP_INSTANCE_ROOT_STMTS (node_instance)[0];
> +      res = vectorizable_early_exit (vinfo, root_stmt_info, NULL, NULL, NULL,
> +                                  cost_vec);

Can you do this here instead:

bool
vect_slp_analyze_operations (vec_info *vinfo)
{
...
          /* Check we can vectorize the reduction.  */
          || (SLP_INSTANCE_KIND (instance) == slp_inst_kind_bb_reduc
              && !vectorizable_bb_reduc_epilogue (instance, &cost_vec)))

as we're checking the roots for slp_inst_kind_ctor and
slp_inst_kind_bb_reduc here already.

> +      if (!res)
> +     {
> +       cost_vec->truncate (cost_vec_rec_start);
> +       return res;
> +     }
> +    }
> +
> +  unsigned visited_rec_start = visited_vec.length ();
>    bool seen_non_constant_child = false;
>    FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
>      {
> @@ -8612,6 +8694,8 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo)
>        !gsi_end_p (gsi); gsi_next (&gsi))
>      {
>        gassign *assign = dyn_cast<gassign *> (gsi_stmt (gsi));
> +      /* This can be used to start SLP discovery for early breaks for BB 
> early breaks
> +      when we get that far.  */
>        if (!assign)
>       continue;
>  
> @@ -10758,7 +10842,7 @@ vect_remove_slp_scalar_calls (vec_info *vinfo, 
> slp_tree node)
>  /* Vectorize the instance root.  */
>  
>  void
> -vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
> +vectorize_slp_instance_root_stmt (vec_info *vinfo, slp_tree node, 
> slp_instance instance)
>  {
>    gassign *rstmt = NULL;
>  
> @@ -10862,6 +10946,21 @@ vectorize_slp_instance_root_stmt (slp_tree node, 
> slp_instance instance)
>        update_stmt (gsi_stmt (rgsi));
>        return;
>      }
> +  else if (instance->kind == slp_inst_kind_gcond)
> +    {
> +      /* Only support a single root for now as we can't codegen CFG yet and 
> so we
> +      can't support lane > 1 at this time.  */
> +      gcc_assert (instance->root_stmts.length () == 1);
> +      auto root_stmt_info = instance->root_stmts[0];
> +      auto last_stmt = vect_find_first_scalar_stmt_in_slp (node)->stmt;
> +      gimple_stmt_iterator rgsi = gsi_for_stmt (last_stmt);

So last_stmt is actually the gcond (and thus root_stmt_info)?  Why
not use that for the rgsi directly?

> +      gimple *vec_stmt = NULL;
> +      gcc_assert (SLP_TREE_NUMBER_OF_VEC_STMTS (node) != 0);

I guess you'd want to assert SLP_TREE_VEC_DEFS is not empty?

> +      bool res = vectorizable_early_exit (vinfo, root_stmt_info, &rgsi,
> +                                       &vec_stmt, node, NULL);
> +      gcc_assert (res);
> +      return;
> +    }
>    else
>      gcc_unreachable ();
>  
> @@ -11080,7 +11179,7 @@ vect_schedule_slp (vec_info *vinfo, const 
> vec<slp_instance> &slp_instances)
>       vect_schedule_scc (vinfo, node, instance, scc_info, maxdfs, stack);
>  
>        if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
> -     vectorize_slp_instance_root_stmt (node, instance);
> +     vectorize_slp_instance_root_stmt (vinfo, node, instance);
>  
>        if (dump_enabled_p ())
>       dump_printf_loc (MSG_NOTE, vect_location,
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> b72b54d666879d8485f8d972b4e8d9dc64bc86b3..8f3f35989879199ffd0eb24729cb7ade856a3c4d
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -411,6 +411,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, 
> loop_vec_info loop_vinfo,
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "vec_stmt_relevant_p: induction forced for "
>                          "early break.\n");
> +      LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS (loop_vinfo).safe_push (stmt_info);
>        *live_p = true;

ah, so it's even live already - what's the relevancy used?  See above
for the SLP discovery question regarding to that we already walk all
actual LC PHI defs.

Otherwise looks OK - thanks for the work!

Richard.

>  
>      }
> @@ -12933,7 +12934,7 @@ vectorizable_comparison (vec_info *vinfo,
>  /* Check to see if the current early break given in STMT_INFO is valid for
>     vectorization.  */
>  
> -static bool
> +bool
>  vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
>                        gimple_stmt_iterator *gsi, gimple **vec_stmt,
>                        slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> @@ -12958,7 +12959,7 @@ vectorizable_early_exit (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>    tree op0;
>    enum vect_def_type dt0;
>    if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, 
> &dt0,
> -                        &vectype))
> +                           &vectype))
>      {
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -12966,6 +12967,13 @@ vectorizable_early_exit (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>       return false;
>      }
>  
> +  /* For SLP we don't want to use the type of the operands of the SLP node, 
> when
> +     vectorizing using SLP slp_node will be the children of the gcond and we 
> want to
> +     use the type of the direct children which since the gcond is root will 
> be the
> +     current node, rather than a child node as vect_is_simple_use assumes.  
> */
> +  if (slp_node)
> +    vectype = SLP_TREE_VECTYPE (slp_node);
> +
>    if (!vectype)
>      return false;
>  
> @@ -13060,9 +13068,18 @@ vectorizable_early_exit (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
>  
> -  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> -                               vec_stmt, slp_node, cost_vec))
> -    gcc_unreachable ();
> +  /* For SLP we don't do codegen of the body starting from the gcond, the 
> gconds are
> +     roots and so by the time we get to them we have already codegened the 
> SLP tree
> +     and so we shouldn't try to do so again.  The arguments have already been
> +     vectorized.  It's not very clean to do this here, But the masking code 
> below is
> +     complex and this keeps it all in one place to ease fixes and backports. 
>  Once we
> +     drop the non-SLP loop vect or split vectorizable_* this can be 
> simplified.  */
> +  if (!slp_node)
> +    {
> +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> +                                   vec_stmt, slp_node, cost_vec))
> +     gcc_unreachable ();
> +    }
>  
>    gimple *stmt = STMT_VINFO_STMT (stmt_info);
>    basic_block cond_bb = gimple_bb (stmt);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 
> 490061aea2f6d465d9589eb97bbd34a920d76b1c..53483303c4ac3482760fe722354f602e0243e5a2
>  100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -252,7 +252,8 @@ enum slp_instance_kind {
>      slp_inst_kind_reduc_group,
>      slp_inst_kind_reduc_chain,
>      slp_inst_kind_bb_reduc,
> -    slp_inst_kind_ctor
> +    slp_inst_kind_ctor,
> +    slp_inst_kind_gcond
>  };
>  
>  /* SLP instance is a sequence of stmts in a loop that can be packed into
> @@ -977,6 +978,10 @@ public:
>    /* Statements whose VUSES need updating if early break vectorization is to
>       happen.  */
>    auto_vec<gimple*> early_break_vuses;
> +
> +  /* Record statements that are needed to be live for early break 
> vectorization
> +     but may not have an LC PHI node materialized yet in the exits.  */
> +  auto_vec<stmt_vec_info> early_break_live_stmts;
>  } *loop_vec_info;
>  
>  /* Access Functions.  */
> @@ -1036,6 +1041,8 @@ public:
>  #define LOOP_VINFO_EARLY_BRK_STORES(L)     (L)->early_break_stores
>  #define LOOP_VINFO_EARLY_BREAKS_VECT_PEELED(L)  \
>    (single_pred ((L)->loop->latch) != (L)->vec_loop_iv_exit->src)
> +#define LOOP_VINFO_EARLY_BREAKS_LIVE_STMTS(L)  \
> +  (L)->early_break_live_stmts
>  #define LOOP_VINFO_EARLY_BRK_DEST_BB(L)    (L)->early_break_dest_bb
>  #define LOOP_VINFO_EARLY_BRK_VUSES(L)      (L)->early_break_vuses
>  #define LOOP_VINFO_LOOP_CONDS(L)           (L)->conds
> @@ -2522,6 +2529,9 @@ extern bool vectorizable_phi (vec_info *, 
> stmt_vec_info, gimple **, slp_tree,
>                             stmt_vector_for_cost *);
>  extern bool vectorizable_recurr (loop_vec_info, stmt_vec_info,
>                                 gimple **, slp_tree, stmt_vector_for_cost *);
> +extern bool vectorizable_early_exit (vec_info *, stmt_vec_info,
> +                                  gimple_stmt_iterator *, gimple **,
> +                                  slp_tree, stmt_vector_for_cost *);
>  extern bool vect_emulated_vector_p (tree);
>  extern bool vect_can_vectorize_without_simd_p (tree_code);
>  extern bool vect_can_vectorize_without_simd_p (code_helper);
> 
> 
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH]middle-end: support SLP early break

Reply via email to