On Thu, May 10, 2018 at 8:31 AM Richard Sandiford <
richard.sandif...@linaro.org> wrote:

> Richard Biener <richard.guent...@gmail.com> writes:
> > On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford
> > <richard.sandif...@linaro.org> wrote:
> >> Richard Biener <richard.guent...@gmail.com> writes:
> >>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
> >>> <richard.sandif...@linaro.org> wrote:
> >>>> The SLP unrolling factor is calculated by finding the smallest
> >>>> scalar type for each SLP statement and taking the number of required
> >>>> lanes from the vector versions of those scalar types.  E.g. for an
> >>>> int32->int64 conversion, it's the vector of int32s rather than the
> >>>> vector of int64s that determines the unroll factor.
> >>>>
> >>>> We rely on tree-vect-patterns.c to replace boolean operations like:
> >>>>
> >>>>    bool a, b, c;
> >>>>    a = b & c;
> >>>>
> >>>> with integer operations of whatever the best size is in context.
> >>>> E.g. if b and c are fed by comparisons of ints, a, b and c will
become
> >>>> the appropriate size for an int comparison.  For most targets this
means
> >>>> that a, b and c will end up as int-sized themselves, but on targets
like
> >>>> SVE and AVX512 with packed vector booleans, they'll instead become a
> >>>> small bitfield like :1, padded to a byte for memory purposes.
> >>>> The SLP code would then take these scalar types and try to calculate
> >>>> the vector type for them, causing the unroll factor to be much higher
> >>>> than necessary.
> >>>>
> >>>> This patch makes SLP use the cached vector boolean type if that's
> >>>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
> >>>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?
> >>>>
> >>>> Richard
> >>>>
> >>>>
> >>>> 2018-05-09  Richard Sandiford  <richard.sandif...@linaro.org>
> >>>>
> >>>> gcc/
> >>>>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type):
New function.
> >>>>         (vect_build_slp_tree_1): Use it when calculating the unroll
factor.
> >>>>
> >>>> gcc/testsuite/
> >>>>         * gcc.target/aarch64/sve/vcond_10.c: New test.
> >>>>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
> >>>>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.
> >>>>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.
> >>>>
> >>>> Index: gcc/tree-vect-slp.c
> >>>> ===================================================================
> >>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100
> >>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100
> >>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,
> >>>>    return true;
> >>>>  }
> >>>>
> >>>> +/* Return the vector type associated with the smallest scalar type
in STMT.  */
> >>>> +
> >>>> +static tree
> >>>> +get_vectype_for_smallest_scalar_type (gimple *stmt)
> >>>> +{
> >>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >>>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> >>>> +  if (vectype != NULL_TREE
> >>>> +      && VECTOR_BOOLEAN_TYPE_P (vectype))
> >>>
> >>> Hum.  At this point you can't really rely on vector types being set...
> >>
> >> Not for everything, but here we only care about the result of the
> >> pattern replacements, and pattern replacements do set the vector type
> >> up-front.  vect_determine_vectorization_factor (which runs earlier
> >> for loop vectorisation) also relies on this.
> >>
> >>>> +    {
> >>>> +      /* The result of a vector boolean operation has the smallest
scalar
> >>>> +        type unless the statement is extending an even narrower
boolean.  */
> >>>> +      if (!gimple_assign_cast_p (stmt))
> >>>> +       return vectype;
> >>>> +
> >>>> +      tree src = gimple_assign_rhs1 (stmt);
> >>>> +      gimple *def_stmt;
> >>>> +      enum vect_def_type dt;
> >>>> +      tree src_vectype = NULL_TREE;
> >>>> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,
> >>>> +                             &src_vectype)
> >>>> +         && src_vectype
> >>>> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))
> >>>> +       {
> >>>> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))
> >>>> +             < TYPE_PRECISION (TREE_TYPE (vectype)))
> >>>> +           return src_vectype;
> >>>> +         return vectype;
> >>>> +       }
> >>>> +    }
> >>>> +  HOST_WIDE_INT dummy;
> >>>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
&dummy);
> >>>> +  return get_vectype_for_scalar_type (scalar_type);
> >>>> +}
> >>>> +
> >>>>  /* Verify if the scalar stmts STMTS are isomorphic, require data
> >>>>     permutation or are of unsupported types of operation.  Return
> >>>>     true if they are, otherwise return false and indicate in *MATCHES
> >>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,
> >>>>    enum tree_code first_cond_code = ERROR_MARK;
> >>>>    tree lhs;
> >>>>    bool need_same_oprnds = false;
> >>>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
> >>>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
> >>>>    optab optab;
> >>>>    int icode;
> >>>>    machine_mode optab_op2_mode;
> >>>>    machine_mode vec_mode;
> >>>> -  HOST_WIDE_INT dummy;
> >>>>    gimple *first_load = NULL, *prev_first_load = NULL;
> >>>>
> >>>>    /* For every stmt in NODE find its def stmt/s.  */
> >>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,
> >>>>           return false;
> >>>>         }
> >>>>
> >>>> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
&dummy);
> >>>
> >>> ... so I wonder how this goes wrong here.
> >>
> >> It picks the right scalar type, but then we go on to use
> >> get_vectype_for_scalar_type when get_mask_type_for_scalar_type
> >> is what we actually want.  The easiest fix for that seemed to use
> >> the vectype that had already been calculated (also as for
> >> vect_determine_vectorization_factor).
> >>
> >>> I suppose we want to ignore vector booleans for the purpose of
max_nunits
> >>> computation.  So isn't a better fix to simply "ignore" those in
> >>> vect_get_smallest_scalar_type instead?  I see that for intermediate
> >>> full-boolean operations like
> >>>
> >>>   a = x[i] < 0;
> >>>   b = y[i] > 0;
> >>>   tem = a & b;
> >>>
> >>> we want to ignore 'tem = a & b' fully here for the purpose of
> >>> vect_record_max_nunits.  So if scalar_type is a bitfield type
> >>> then skip it?
> >>
> >> Bitfield types will always be the smallest scalar type if they're
> >> present, so I think in pathological cases this could make us
> >> incorrectly ignore source operands to a compare.
> >>
> >> If we're confident that compares and casts of
VECT_SCALAR_BOOLEAN_TYPE_Ps
> >> never affect the VF or UF then we should probably skip them based on
> >> that rather than whether the scalar type is a bitfield, so that the
> >> behaviour is the same for all targets.  It seems a bit dangerous
though...
> >
> > Well, all stmts that have no inherent promotion / demotion have no
> > effect on the VF
> > if you also have loads / stores.
> >
> > One reason I dislike the current way of computing vector types and
vectorization
> > factor is that it tries to do that ad-hoc from looking at stmts
> > locally instead of
> > somehow propagating things from sources to sinks -- which would be a
requirement
> > if we ever drop the requirement of same-sized vector types throughout
> > vectorization...

> Yeah.  This patch was just supposed to be a point improvement rather
> than perfection.

> > In fact I wonder if we can get away with recording max_nunits here and
delay
> > SLP_INSTANCE_UNROLLING_FACTOR computation until we compute the actual
vector
> > types.  I think the code is most useful for BB vectorization where we
> > need to terminate
> > the SLP when we get to stmts we cannot handle without "unrolling"
> > (given the vector
> > size constraint).

> Part of the problem is that vect_build_slp_tree_1 also uses the vector
> type to choose between shifts by vectors and shifts by scalars, and to
> test whether two-operand permutes are valid.  So as things stand I think
> we do need to know the vector type at some level here, even though those
> two cases aren't interesting for booleans.

> > Anyhow - I probably dislike your patch most because you add another
> > get_vectype_for_smallest_scalar_type helper which looks like a hack to
me...
> >
> > How is this issue solved for the non-SLP case?  I do remember that
function
> > computing the VF and/or vector types is quite a mess with vector
booleans...

> OK, for the purposes of fixing this bug, would it be OK to split out
> the code in vect_determine_vectorization_factor that computes the
> vector types and reuse it in SLP, even though I don't think either
> of us like the way it's done?  At least that way there's only one
> place to change in future.

> This patch does that.  I tweaked a couple of the comments and
> added a couple more dump lines, but otherwise the code in
> vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt
> is the same as the original.

> Tested as before.

Much better - thanks for doing it.  OK for trunk and sorry again for the
delay...

Richard.

> Thanks,
> Richard


> 2018-05-10  Richard Sandiford  <richard.sandif...@linaro.org>

> gcc/
>          * tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.
>          (vect_get_mask_type_for_stmt): Likewise.
>          * tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,
>          split out from...
>          (vect_build_slp_tree_1): ...here.  Use
vect_get_vector_types_for_stmt
>          to determine the statement's vector type and the vector type that
>          should be used for calculating nunits.  Deal with cases in which
>          the type has to be deferred.
>          (vect_slp_analyze_node_operations): Use
vect_get_vector_types_for_stmt
>          and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.
>          * tree-vect-loop.c (vect_determine_vf_for_stmt_1)
>          (vect_determine_vf_for_stmt): New functions, split out from...
>          (vect_determine_vectorization_factor): ...here.
>          * tree-vect-stmts.c (vect_get_vector_types_for_stmt)
>          (vect_get_mask_type_for_stmt): New functions, split out from
>          vect_determine_vectorization_factor.

> gcc/testsuite/
>          * gcc.target/aarch64/sve/vcond_10.c: New test.
>          * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_11.c: Likewise.
>          * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

> Index: gcc/tree-vectorizer.h
> ===================================================================
> --- gcc/tree-vectorizer.h       2018-05-10 07:18:12.104514856 +0100
> +++ gcc/tree-vectorizer.h       2018-05-10 07:18:12.322505512 +0100
> @@ -1467,6 +1467,8 @@ extern tree vect_gen_perm_mask_checked (
>   extern void optimize_mask_stores (struct loop*);
>   extern gcall *vect_gen_while (tree, tree, tree);
>   extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
> +extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree
*);
> +extern tree vect_get_mask_type_for_stmt (stmt_vec_info);

>   /* In tree-vect-data-refs.c.  */
>   extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2018-05-10 07:18:12.104514856 +0100
> +++ gcc/tree-vect-slp.c 2018-05-10 07:18:12.321505555 +0100
> @@ -608,6 +608,33 @@ vect_record_max_nunits (vec_info *vinfo,
>     return true;
>   }

> +/* STMTS is a group of GROUP_SIZE SLP statements in which some
> +   statements do the same operation as the first statement and in which
> +   the others do ALT_STMT_CODE.  Return true if we can take one vector
> +   of the first operation and one vector of the second and permute them
> +   to get the required result.  VECTYPE is the type of the vector that
> +   would be permuted.  */
> +
> +static bool
> +vect_two_operations_perm_ok_p (vec<gimple *> stmts, unsigned int
group_size,
> +                              tree vectype, tree_code alt_stmt_code)
> +{
> +  unsigned HOST_WIDE_INT count;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&count))
> +    return false;
> +
> +  vec_perm_builder sel (count, count, 1);
> +  for (unsigned int i = 0; i < count; ++i)
> +    {
> +      unsigned int elt = i;
> +      if (gimple_assign_rhs_code (stmts[i % group_size]) ==
alt_stmt_code)
> +       elt += count;
> +      sel.quick_push (elt);
> +    }
> +  vec_perm_indices indices (sel, 2, count);
> +  return can_vec_perm_const_p (TYPE_MODE (vectype), indices);
> +}
> +
>   /* Verify if the scalar stmts STMTS are isomorphic, require data
>      permutation or are of unsupported types of operation.  Return
>      true if they are, otherwise return false and indicate in *MATCHES
> @@ -636,17 +663,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>     enum tree_code first_cond_code = ERROR_MARK;
>     tree lhs;
>     bool need_same_oprnds = false;
> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
>     optab optab;
>     int icode;
>     machine_mode optab_op2_mode;
>     machine_mode vec_mode;
> -  HOST_WIDE_INT dummy;
>     gimple *first_load = NULL, *prev_first_load = NULL;

>     /* For every stmt in NODE find its def stmt/s.  */
>     FOR_EACH_VEC_ELT (stmts, i, stmt)
>       {
> +      stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>         swap[i] = 0;
>         matches[i] = false;

> @@ -685,15 +712,19 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>            return false;
>          }

> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
> -      vectype = get_vectype_for_scalar_type (scalar_type);
> -      if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
> -                                  max_nunits))
> +      tree nunits_vectype;
> +      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> +                                          &nunits_vectype)
> +         || (nunits_vectype
> +             && !vect_record_max_nunits (vinfo, stmt, group_size,
> +                                         nunits_vectype, max_nunits)))
>          {
>            /* Fatal mismatch.  */
>            matches[0] = false;
> -          return false;
> -        }
> +         return false;
> +       }
> +
> +      gcc_assert (vectype);

>         if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
>          {
> @@ -730,6 +761,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,
>                || rhs_code == LROTATE_EXPR
>                || rhs_code == RROTATE_EXPR)
>              {
> +             if (vectype == boolean_type_node)
> +               {
> +                 if (dump_enabled_p ())
> +                   dump_printf_loc (MSG_MISSED_OPTIMIZATION,
vect_location,
> +                                    "Build SLP failed: shift of a"
> +                                    " boolean.\n");
> +                 /* Fatal mismatch.  */
> +                 matches[0] = false;
> +                 return false;
> +               }
> +
>                vec_mode = TYPE_MODE (vectype);

>                /* First see if we have a vector/vector shift.  */
> @@ -973,29 +1015,12 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>     /* If we allowed a two-operation SLP node verify the target can cope
>        with the permute we are going to use.  */
> -  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>     if (alt_stmt_code != ERROR_MARK
>         && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
>       {
> -      unsigned HOST_WIDE_INT count;
> -      if (!nunits.is_constant (&count))
> -       {
> -         if (dump_enabled_p ())
> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                            "Build SLP failed: different operations "
> -                            "not allowed with variable-length SLP.\n");
> -         return false;
> -       }
> -      vec_perm_builder sel (count, count, 1);
> -      for (i = 0; i < count; ++i)
> -       {
> -         unsigned int elt = i;
> -         if (gimple_assign_rhs_code (stmts[i % group_size]) ==
alt_stmt_code)
> -           elt += count;
> -         sel.quick_push (elt);
> -       }
> -      vec_perm_indices indices (sel, 2, count);
> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
> +      if (vectype == boolean_type_node
> +         || !vect_two_operations_perm_ok_p (stmts, group_size,
> +                                            vectype, alt_stmt_code))
>          {
>            for (i = 0; i < group_size; ++i)
>              if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
> @@ -2759,36 +2784,18 @@ vect_slp_analyze_node_operations (vec_in
>     if (bb_vinfo
>         && ! STMT_VINFO_DATA_REF (stmt_info))
>       {
> -      gcc_assert (PURE_SLP_STMT (stmt_info));
> -
> -      tree scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> -      if (dump_enabled_p ())
> -       {
> -         dump_printf_loc (MSG_NOTE, vect_location,
> -                          "get vectype for scalar type:  ");
> -         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
> -         dump_printf (MSG_NOTE, "\n");
> -       }
> -
> -      tree vectype = get_vectype_for_scalar_type (scalar_type);
> -      if (!vectype)
> -       {
> -         if (dump_enabled_p ())
> -           {
> -             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                              "not SLPed: unsupported data-type ");
> -             dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> -                                scalar_type);
> -             dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -           }
> -         return false;
> -       }
> -
> -      if (dump_enabled_p ())
> -       {
> -         dump_printf_loc (MSG_NOTE, vect_location, "vectype:  ");
> -         dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
> -         dump_printf (MSG_NOTE, "\n");
> +      tree vectype, nunits_vectype;
> +      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
> +                                          &nunits_vectype))
> +       /* We checked this when building the node.  */
> +       gcc_unreachable ();
> +      if (vectype == boolean_type_node)
> +       {
> +         vectype = vect_get_mask_type_for_stmt (stmt_info);
> +         if (!vectype)
> +           /* vect_get_mask_type_for_stmt has already explained the
> +              failure.  */
> +           return false;
>          }

>         gimple *sstmt;
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2018-05-10 07:18:12.104514856 +0100
> +++ gcc/tree-vect-loop.c        2018-05-10 07:18:12.320505598 +0100
> @@ -155,6 +155,108 @@ Software Foundation; either version 3, o

>   static void vect_estimate_min_profitable_iters (loop_vec_info, int *,
int *);

> +/* Subroutine of vect_determine_vf_for_stmt that handles only one
> +   statement.  VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE
> +   may already be set for general statements (not just data refs).  */
> +
> +static bool
> +vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info,
> +                             bool vectype_maybe_set_p,
> +                             poly_uint64 *vf,
> +                             vec<stmt_vec_info > *mask_producers)
> +{
> +  gimple *stmt = stmt_info->stmt;
> +
> +  if ((!STMT_VINFO_RELEVANT_P (stmt_info)
> +       && !STMT_VINFO_LIVE_P (stmt_info))
> +      || gimple_clobber_p (stmt))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");
> +      return true;
> +    }
> +
> +  tree stmt_vectype, nunits_vectype;
> +  if (!vect_get_vector_types_for_stmt (stmt_info, &stmt_vectype,
> +                                      &nunits_vectype))
> +    return false;
> +
> +  if (stmt_vectype)
> +    {
> +      if (STMT_VINFO_VECTYPE (stmt_info))
> +       /* The only case when a vectype had been already set is for stmts
> +          that contain a data ref, or for "pattern-stmts" (stmts
generated
> +          by the vectorizer to represent/replace a certain idiom).  */
> +       gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)
> +                    || vectype_maybe_set_p)
> +                   && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype);
> +      else if (stmt_vectype == boolean_type_node)
> +       mask_producers->safe_push (stmt_info);
> +      else
> +       STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
> +    }
> +
> +  if (nunits_vectype)
> +    vect_update_max_nunits (vf, nunits_vectype);
> +
> +  return true;
> +}
> +
> +/* Subroutine of vect_determine_vectorization_factor.  Set the vector
> +   types of STMT_INFO and all attached pattern statements and update
> +   the vectorization factor VF accordingly.  If some of the statements
> +   produce a mask result whose vector type can only be calculated later,
> +   add them to MASK_PRODUCERS.  Return true on success or false if
> +   something prevented vectorization.  */
> +
> +static bool
> +vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf,
> +                           vec<stmt_vec_info > *mask_producers)
> +{
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "==> examining
statement: ");
> +      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
> +    }
> +  if (!vect_determine_vf_for_stmt_1 (stmt_info, false, vf,
mask_producers))
> +    return false;
> +
> +  if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> +      && STMT_VINFO_RELATED_STMT (stmt_info))
> +    {
> +      stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));
> +
> +      /* If a pattern statement has def stmts, analyze them too.  */
> +      gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
> +      for (gimple_stmt_iterator si = gsi_start (pattern_def_seq);
> +          !gsi_end_p (si); gsi_next (&si))
> +       {
> +         stmt_vec_info def_stmt_info = vinfo_for_stmt (gsi_stmt (si));
> +         if (dump_enabled_p ())
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location,
> +                              "==> examining pattern def stmt: ");
> +             dump_gimple_stmt (MSG_NOTE, TDF_SLIM,
> +                               def_stmt_info->stmt, 0);
> +           }
> +         if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true,
> +                                            vf, mask_producers))
> +           return false;
> +       }
> +
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "==> examining pattern statement: ");
> +         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
> +       }
> +      if (!vect_determine_vf_for_stmt_1 (stmt_info, true, vf,
mask_producers))
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
>   /* Function vect_determine_vectorization_factor

>      Determine the vectorization factor (VF).  VF is the number of data
elements
> @@ -192,12 +294,6 @@ vect_determine_vectorization_factor (loo
>     tree vectype;
>     stmt_vec_info stmt_info;
>     unsigned i;
> -  HOST_WIDE_INT dummy;
> -  gimple *stmt, *pattern_stmt = NULL;
> -  gimple_seq pattern_def_seq = NULL;
> -  gimple_stmt_iterator pattern_def_si = gsi_none ();
> -  bool analyze_pattern_stmt = false;
> -  bool bool_result;
>     auto_vec<stmt_vec_info> mask_producers;

>     if (dump_enabled_p ())
> @@ -269,304 +365,13 @@ vect_determine_vectorization_factor (loo
>              }
>          }

> -      for (gimple_stmt_iterator si = gsi_start_bb (bb);
> -          !gsi_end_p (si) || analyze_pattern_stmt;)
> -        {
> -          tree vf_vectype;
> -
> -          if (analyze_pattern_stmt)
> -           stmt = pattern_stmt;
> -          else
> -            stmt = gsi_stmt (si);
> -
> -          stmt_info = vinfo_for_stmt (stmt);
> -
> -         if (dump_enabled_p ())
> -           {
> -             dump_printf_loc (MSG_NOTE, vect_location,
> -                               "==> examining statement: ");
> -             dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> -           }
> -
> -         gcc_assert (stmt_info);
> -
> -         /* Skip stmts which do not need to be vectorized.  */
> -         if ((!STMT_VINFO_RELEVANT_P (stmt_info)
> -              && !STMT_VINFO_LIVE_P (stmt_info))
> -             || gimple_clobber_p (stmt))
> -            {
> -              if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> -                  && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))
> -                  && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt
(pattern_stmt))
> -                      || STMT_VINFO_LIVE_P (vinfo_for_stmt
(pattern_stmt))))
> -                {
> -                  stmt = pattern_stmt;
> -                  stmt_info = vinfo_for_stmt (pattern_stmt);
> -                  if (dump_enabled_p ())
> -                    {
> -                      dump_printf_loc (MSG_NOTE, vect_location,
> -                                       "==> examining pattern statement:
");
> -                      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
> -                    }
> -                }
> -              else
> -               {
> -                 if (dump_enabled_p ())
> -                   dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");
> -                  gsi_next (&si);
> -                 continue;
> -                }
> -           }
> -          else if (STMT_VINFO_IN_PATTERN_P (stmt_info)
> -                   && (pattern_stmt = STMT_VINFO_RELATED_STMT
(stmt_info))
> -                   && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt
(pattern_stmt))
> -                       || STMT_VINFO_LIVE_P (vinfo_for_stmt
(pattern_stmt))))
> -            analyze_pattern_stmt = true;
> -
> -         /* If a pattern statement has def stmts, analyze them too.  */
> -         if (is_pattern_stmt_p (stmt_info))
> -           {
> -             if (pattern_def_seq == NULL)
> -               {
> -                 pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ
(stmt_info);
> -                 pattern_def_si = gsi_start (pattern_def_seq);
> -               }
> -             else if (!gsi_end_p (pattern_def_si))
> -               gsi_next (&pattern_def_si);
> -             if (pattern_def_seq != NULL)
> -               {
> -                 gimple *pattern_def_stmt = NULL;
> -                 stmt_vec_info pattern_def_stmt_info = NULL;
> -
> -                 while (!gsi_end_p (pattern_def_si))
> -                   {
> -                     pattern_def_stmt = gsi_stmt (pattern_def_si);
> -                     pattern_def_stmt_info
> -                       = vinfo_for_stmt (pattern_def_stmt);
> -                     if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
> -                         || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
> -                       break;
> -                     gsi_next (&pattern_def_si);
> -                   }
> -
> -                 if (!gsi_end_p (pattern_def_si))
> -                   {
> -                     if (dump_enabled_p ())
> -                       {
> -                         dump_printf_loc (MSG_NOTE, vect_location,
> -                                           "==> examining pattern def
stmt: ");
> -                         dump_gimple_stmt (MSG_NOTE, TDF_SLIM,
> -                                            pattern_def_stmt, 0);
> -                       }
> -
> -                     stmt = pattern_def_stmt;
> -                     stmt_info = pattern_def_stmt_info;
> -                   }
> -                 else
> -                   {
> -                     pattern_def_si = gsi_none ();
> -                     analyze_pattern_stmt = false;
> -                   }
> -               }
> -             else
> -               analyze_pattern_stmt = false;
> -           }
> -
> -         if (gimple_get_lhs (stmt) == NULL_TREE
> -             /* MASK_STORE has no lhs, but is ok.  */
> -             && (!is_gimple_call (stmt)
> -                 || !gimple_call_internal_p (stmt)
> -                 || gimple_call_internal_fn (stmt) != IFN_MASK_STORE))
> -           {
> -             if (is_gimple_call (stmt))
> -               {
> -                 /* Ignore calls with no lhs.  These must be calls to
> -                    #pragma omp simd functions, and what vectorization
factor
> -                    it really needs can't be determined until
> -                    vectorizable_simd_clone_call.  */
> -                 if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> -                   {
> -                     pattern_def_seq = NULL;
> -                     gsi_next (&si);
> -                   }
> -                 continue;
> -               }
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                   "not vectorized: irregular stmt.");
> -                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM,
stmt,
> -                                    0);
> -               }
> -             return false;
> -           }
> -
> -         if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
> -           {
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                   "not vectorized: vector stmt in
loop:");
> -                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
stmt, 0);
> -               }
> -             return false;
> -           }
> -
> -         bool_result = false;
> -
> -         if (STMT_VINFO_VECTYPE (stmt_info))
> -           {
> -             /* The only case when a vectype had been already set is for
stmts
> -                that contain a dataref, or for "pattern-stmts" (stmts
> -                generated by the vectorizer to represent/replace a
certain
> -                idiom).  */
> -             gcc_assert (STMT_VINFO_DATA_REF (stmt_info)
> -                         || is_pattern_stmt_p (stmt_info)
> -                         || !gsi_end_p (pattern_def_si));
> -             vectype = STMT_VINFO_VECTYPE (stmt_info);
> -           }
> -         else
> -           {
> -             gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
> -             if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> -               scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> -             else
> -               scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> -
> -             /* Bool ops don't participate in vectorization factor
> -                computation.  For comparison use compared types to
> -                compute a factor.  */
> -             if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
> -                 && is_gimple_assign (stmt)
> -                 && gimple_assign_rhs_code (stmt) != COND_EXPR)
> -               {
> -                 if (STMT_VINFO_RELEVANT_P (stmt_info)
> -                     || STMT_VINFO_LIVE_P (stmt_info))
> -                   mask_producers.safe_push (stmt_info);
> -                 bool_result = true;
> -
> -                 if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
> -                     == tcc_comparison
> -                     && !VECT_SCALAR_BOOLEAN_TYPE_P
> -                           (TREE_TYPE (gimple_assign_rhs1 (stmt))))
> -                   scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -                 else
> -                   {
> -                     if (!analyze_pattern_stmt && gsi_end_p
(pattern_def_si))
> -                       {
> -                         pattern_def_seq = NULL;
> -                         gsi_next (&si);
> -                       }
> -                     continue;
> -                   }
> -               }
> -
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_NOTE, vect_location,
> -                                   "get vectype for scalar type:  ");
> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
> -                  dump_printf (MSG_NOTE, "\n");
> -               }
> -             vectype = get_vectype_for_scalar_type (scalar_type);
> -             if (!vectype)
> -               {
> -                 if (dump_enabled_p ())
> -                   {
> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
vect_location,
> -                                       "not vectorized: unsupported "
> -                                       "data-type ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,
TDF_SLIM,
> -                                         scalar_type);
> -                      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -                   }
> -                 return false;
> -               }
> -
> -             if (!bool_result)
> -               STMT_VINFO_VECTYPE (stmt_info) = vectype;
> -
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
> -                  dump_printf (MSG_NOTE, "\n");
> -               }
> -            }
> -
> -         /* Don't try to compute VF out scalar types if we stmt
> -            produces boolean vector.  Use result vectype instead.  */
> -         if (VECTOR_BOOLEAN_TYPE_P (vectype))
> -           vf_vectype = vectype;
> -         else
> -           {
> -             /* The vectorization factor is according to the smallest
> -                scalar type (or the largest vector size, but we only
> -                support one vector size per loop).  */
> -             if (!bool_result)
> -               scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
> -                                                            &dummy);
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_NOTE, vect_location,
> -                                  "get vectype for scalar type:  ");
> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
> -                 dump_printf (MSG_NOTE, "\n");
> -               }
> -             vf_vectype = get_vectype_for_scalar_type (scalar_type);
> -           }
> -         if (!vf_vectype)
> -           {
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                   "not vectorized: unsupported
data-type ");
> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> -                                     scalar_type);
> -                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -               }
> -             return false;
> -           }
> -
> -         if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
> -                       GET_MODE_SIZE (TYPE_MODE (vf_vectype))))
> -           {
> -             if (dump_enabled_p ())
> -               {
> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                   "not vectorized: different sized
vector "
> -                                   "types in statement, ");
> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> -                                     vectype);
> -                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> -                                     vf_vectype);
> -                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -               }
> -             return false;
> -           }
> -
> -         if (dump_enabled_p ())
> -           {
> -             dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
> -             dump_generic_expr (MSG_NOTE, TDF_SLIM, vf_vectype);
> -              dump_printf (MSG_NOTE, "\n");
> -           }
> -
> -         if (dump_enabled_p ())
> -           {
> -             dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
> -             dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype));
> -             dump_printf (MSG_NOTE, "\n");
> -           }
> -
> -         vect_update_max_nunits (&vectorization_factor, vf_vectype);
> -
> -         if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> -           {
> -             pattern_def_seq = NULL;
> -             gsi_next (&si);
> -           }
> +      for (gimple_stmt_iterator si = gsi_start_bb (bb); !gsi_end_p (si);
> +          gsi_next (&si))
> +       {
> +         stmt_info = vinfo_for_stmt (gsi_stmt (si));
> +         if (!vect_determine_vf_for_stmt (stmt_info,
&vectorization_factor,
> +                                          &mask_producers))
> +           return false;
>           }
>       }

> @@ -589,119 +394,11 @@ vect_determine_vectorization_factor (loo

>     for (i = 0; i < mask_producers.length (); i++)
>       {
> -      tree mask_type = NULL;
> -
> -      stmt = STMT_VINFO_STMT (mask_producers[i]);
> -
> -      if (is_gimple_assign (stmt)
> -         && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
tcc_comparison
> -         && !VECT_SCALAR_BOOLEAN_TYPE_P
> -                                     (TREE_TYPE (gimple_assign_rhs1
(stmt))))
> -       {
> -         scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -         mask_type = get_mask_type_for_scalar_type (scalar_type);
> -
> -         if (!mask_type)
> -           {
> -             if (dump_enabled_p ())
> -               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                                "not vectorized: unsupported mask\n");
> -             return false;
> -           }
> -       }
> -      else
> -       {
> -         tree rhs;
> -         ssa_op_iter iter;
> -         gimple *def_stmt;
> -         enum vect_def_type dt;
> -
> -         FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
> -           {
> -             if (!vect_is_simple_use (rhs, mask_producers[i]->vinfo,
> -                                      &def_stmt, &dt, &vectype))
> -               {
> -                 if (dump_enabled_p ())
> -                   {
> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
vect_location,
> -                                      "not vectorized: can't compute
mask type "
> -                                      "for statement, ");
> -                     dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,
  TDF_SLIM, stmt,
> -                                       0);
> -                   }
> -                 return false;
> -               }
> -
> -             /* No vectype probably means external definition.
> -                Allow it in case there is another operand which
> -                allows to determine mask type.  */
> -             if (!vectype)
> -               continue;
> -
> -             if (!mask_type)
> -               mask_type = vectype;
> -             else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
> -                                TYPE_VECTOR_SUBPARTS (vectype)))
> -               {
> -                 if (dump_enabled_p ())
> -                   {
> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
vect_location,
> -                                      "not vectorized: different sized
masks "
> -                                      "types in statement, ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,
TDF_SLIM,
> -                                        mask_type);
> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,
TDF_SLIM,
> -                                        vectype);
> -                     dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -                   }
> -                 return false;
> -               }
> -             else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> -                      != VECTOR_BOOLEAN_TYPE_P (vectype))
> -               {
> -                 if (dump_enabled_p ())
> -                   {
> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,
vect_location,
> -                                      "not vectorized: mixed mask and "
> -                                      "nonmask vector types in
statement, ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,
TDF_SLIM,
> -                                        mask_type);
> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,
TDF_SLIM,
> -                                        vectype);
> -                     dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> -                   }
> -                 return false;
> -               }
> -           }
> -
> -         /* We may compare boolean value loaded as vector of integers.
> -            Fix mask_type in such case.  */
> -         if (mask_type
> -             && !VECTOR_BOOLEAN_TYPE_P (mask_type)
> -             && gimple_code (stmt) == GIMPLE_ASSIGN
> -             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
tcc_comparison)
> -           mask_type = build_same_sized_truth_vector_type (mask_type);
> -       }
> -
> -      /* No mask_type should mean loop invariant predicate.
> -        This is probably a subject for optimization in
> -        if-conversion.  */
> +      stmt_info = mask_producers[i];
> +      tree mask_type = vect_get_mask_type_for_stmt (stmt_info);
>         if (!mask_type)
> -       {
> -         if (dump_enabled_p ())
> -           {
> -             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                              "not vectorized: can't compute mask type "
> -                              "for statement, ");
> -             dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
> -                               0);
> -           }
> -         return false;
> -       }
> -
> -      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
> +       return false;
> +      STMT_VINFO_VECTYPE (stmt_info) = mask_type;
>       }

>     return true;
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2018-05-10 07:18:12.104514856 +0100
> +++ gcc/tree-vect-stmts.c       2018-05-10 07:18:12.322505512 +0100
> @@ -10520,3 +10520,311 @@ vect_gen_while_not (gimple_seq *seq, tre
>     gimple_seq_add_stmt (seq, call);
>     return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
>   }
> +
> +/* Try to compute the vector types required to vectorize STMT_INFO,
> +   returning true on success and false if vectorization isn't possible.
> +
> +   On success:
> +
> +   - Set *STMT_VECTYPE_OUT to:
> +     - NULL_TREE if the statement doesn't need to be vectorized;
> +     - boolean_type_node if the statement is a boolean operation whose
> +       vector type can only be determined once all the other vector types
> +       are known; and
> +     - the equivalent of STMT_VINFO_VECTYPE otherwise.
> +
> +   - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
> +     number of units needed to vectorize STMT_INFO, or NULL_TREE if the
> +     statement does not help to determine the overall number of units.
  */
> +
> +bool
> +vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
> +                               tree *stmt_vectype_out,
> +                               tree *nunits_vectype_out)
> +{
> +  gimple *stmt = stmt_info->stmt;
> +
> +  *stmt_vectype_out = NULL_TREE;
> +  *nunits_vectype_out = NULL_TREE;
> +
> +  if (gimple_get_lhs (stmt) == NULL_TREE
> +      /* MASK_STORE has no lhs, but is ok.  */
> +      && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
> +    {
> +      if (is_a <gcall *> (stmt))
> +       {
> +         /* Ignore calls with no lhs.  These must be calls to
> +            #pragma omp simd functions, and what vectorization factor
> +            it really needs can't be determined until
> +            vectorizable_simd_clone_call.  */
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_NOTE, vect_location,
> +                            "defer to SIMD clone analysis.\n");
> +         return true;
> +       }
> +
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "not vectorized: irregular stmt.");
> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> +       }
> +      return false;
> +    }
> +
> +  if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "not vectorized: vector stmt in loop:");
> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> +       }
> +      return false;
> +    }
> +
> +  tree vectype;
> +  tree scalar_type = NULL_TREE;
> +  if (STMT_VINFO_VECTYPE (stmt_info))
> +    *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  else
> +    {
> +      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
> +      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> +       scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> +      else
> +       scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> +
> +      /* Pure bool ops don't participate in number-of-units computation.
> +        For comparisons use the types being compared.  */
> +      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
> +         && is_gimple_assign (stmt)
> +         && gimple_assign_rhs_code (stmt) != COND_EXPR)
> +       {
> +         *stmt_vectype_out = boolean_type_node;
> +
> +         tree rhs1 = gimple_assign_rhs1 (stmt);
> +         if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
tcc_comparison
> +             && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
> +           scalar_type = TREE_TYPE (rhs1);
> +         else
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_NOTE, vect_location,
> +                                "pure bool operation.\n");
> +             return true;
> +           }
> +       }
> +
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "get vectype for scalar type:  ");
> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
> +         dump_printf (MSG_NOTE, "\n");
> +       }
> +      vectype = get_vectype_for_scalar_type (scalar_type);
> +      if (!vectype)
> +       {
> +         if (dump_enabled_p ())
> +           {
> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                              "not vectorized: unsupported data-type ");
> +             dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +                                scalar_type);
> +             dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> +           }
> +         return false;
> +       }
> +
> +      if (!*stmt_vectype_out)
> +       *stmt_vectype_out = vectype;
> +
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
> +         dump_printf (MSG_NOTE, "\n");
> +       }
> +    }
> +
> +  /* Don't try to compute scalar types if the stmt produces a boolean
> +     vector; use the existing vector type instead.  */
> +  tree nunits_vectype;
> +  if (VECTOR_BOOLEAN_TYPE_P (vectype))
> +    nunits_vectype = vectype;
> +  else
> +    {
> +      /* The number of units is set according to the smallest scalar
> +        type (or the largest vector size, but we only support one
> +        vector size per vectorization).  */
> +      if (*stmt_vectype_out != boolean_type_node)
> +       {
> +         HOST_WIDE_INT dummy;
> +         scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
&dummy);
> +       }
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_NOTE, vect_location,
> +                          "get vectype for scalar type:  ");
> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
> +         dump_printf (MSG_NOTE, "\n");
> +       }
> +      nunits_vectype = get_vectype_for_scalar_type (scalar_type);
> +    }
> +  if (!nunits_vectype)
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "not vectorized: unsupported data-type ");
> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
scalar_type);
> +         dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> +       }
> +      return false;
> +    }
> +
> +  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
> +               GET_MODE_SIZE (TYPE_MODE (nunits_vectype))))
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "not vectorized: different sized vector "
> +                          "types in statement, ");
> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype);
> +         dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
nunits_vectype);
> +         dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> +       }
> +      return false;
> +    }
> +
> +  if (dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, nunits_vectype);
> +      dump_printf (MSG_NOTE, "\n");
> +
> +      dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
> +      dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype));
> +      dump_printf (MSG_NOTE, "\n");
> +    }
> +
> +  *nunits_vectype_out = nunits_vectype;
> +  return true;
> +}
> +
> +/* Try to determine the correct vector type for STMT_INFO, which is a
> +   statement that produces a scalar boolean result.  Return the vector
> +   type on success, otherwise return NULL_TREE.  */
> +
> +tree
> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)
> +{
> +  gimple *stmt = stmt_info->stmt;
> +  tree mask_type = NULL;
> +  tree vectype, scalar_type;
> +
> +  if (is_gimple_assign (stmt)
> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
tcc_comparison
> +      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1
(stmt))))
> +    {
> +      scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> +      mask_type = get_mask_type_for_scalar_type (scalar_type);
> +
> +      if (!mask_type)
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "not vectorized: unsupported mask\n");
> +         return NULL_TREE;
> +       }
> +    }
> +  else
> +    {
> +      tree rhs;
> +      ssa_op_iter iter;
> +      gimple *def_stmt;
> +      enum vect_def_type dt;
> +
> +      FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
> +       {
> +         if (!vect_is_simple_use (rhs, stmt_info->vinfo,
> +                                  &def_stmt, &dt, &vectype))
> +           {
> +             if (dump_enabled_p ())
> +               {
> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                  "not vectorized: can't compute mask
type "
> +                                  "for statement, ");
> +                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
stmt,
> +                                   0);
> +               }
> +             return NULL_TREE;
> +           }
> +
> +         /* No vectype probably means external definition.
> +            Allow it in case there is another operand which
> +            allows to determine mask type.  */
> +         if (!vectype)
> +           continue;
> +
> +         if (!mask_type)
> +           mask_type = vectype;
> +         else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
> +                            TYPE_VECTOR_SUBPARTS (vectype)))
> +           {
> +             if (dump_enabled_p ())
> +               {
> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                  "not vectorized: different sized masks
"
> +                                  "types in statement, ");
> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +                                    mask_type);
> +                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +                                    vectype);
> +                 dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> +               }
> +             return NULL_TREE;
> +           }
> +         else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
> +                  != VECTOR_BOOLEAN_TYPE_P (vectype))
> +           {
> +             if (dump_enabled_p ())
> +               {
> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                  "not vectorized: mixed mask and "
> +                                  "nonmask vector types in statement, ");
> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +                                    mask_type);
> +                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> +                                    vectype);
> +                 dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
> +               }
> +             return NULL_TREE;
> +           }
> +       }
> +
> +      /* We may compare boolean value loaded as vector of integers.
> +        Fix mask_type in such case.  */
> +      if (mask_type
> +         && !VECTOR_BOOLEAN_TYPE_P (mask_type)
> +         && gimple_code (stmt) == GIMPLE_ASSIGN
> +         && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==
tcc_comparison)
> +       mask_type = build_same_sized_truth_vector_type (mask_type);
> +    }
> +
> +  /* No mask_type should mean loop invariant predicate.
> +     This is probably a subject for optimization in if-conversion.  */
> +  if (!mask_type && dump_enabled_p ())
> +    {
> +      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                      "not vectorized: can't compute mask type "
> +                      "for statement, ");
> +      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> +    }
> +  return mask_type;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c     2018-05-10
07:18:12.317505726 +0100
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> +#include <stdint.h>
> +
> +#define DEF_LOOP(TYPE)                                                 \
> +  void __attribute__ ((noinline, noclone))                             \
> +  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)     \
> +  {                                                                    \
> +    for (int i = 0; i < n; i += 2)                                     \
> +      {
        \
> +       a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;                        \
> +       a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;            \
> +      }
        \
> +  }
> +
> +#define FOR_EACH_TYPE(T) \
> +  T (int8_t) \
> +  T (uint8_t) \
> +  T (int16_t) \
> +  T (uint16_t) \
> +  T (int32_t) \
> +  T (uint32_t) \
> +  T (int64_t) \
> +  T (uint64_t) \
> +  T (_Float16) \
> +  T (float) \
> +  T (double)
> +
> +FOR_EACH_TYPE (DEF_LOOP)
> +
> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
> +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */
> +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
> +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */
> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-10
07:18:12.317505726 +0100
> @@ -0,0 +1,24 @@
> +/* { dg-do run { target aarch64_sve_hw } } */
> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> +#include "vcond_10.c"
> +
> +#define N 133
> +
> +#define TEST_LOOP(TYPE)
        \
> +  {                                                                    \
> +    TYPE a[N];                                                         \
> +    for (int i = 0; i < N; ++i)
        \
> +      a[i] = i % 7;                                                    \
> +    test_##TYPE (a, 10, 11, 12, 13, N);
        \
> +    for (int i = 0; i < N; ++i)
        \
> +      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))       \
> +       __builtin_abort ();                                             \
> +  }
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (TEST_LOOP);
> +  return 0;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c
> ===================================================================
> --- /dev/null   2018-04-20 16:19:46.369131350 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c     2018-05-10
07:18:12.317505726 +0100
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
> +
> +#include <stdint.h>
> +
> +#define DEF_LOOP(TYPE)                                                 \
> +  void __att

Reply via email to