On Thu, May 10, 2018 at 8:31 AM Richard Sandiford < richard.sandif...@linaro.org> wrote:
> Richard Biener <richard.guent...@gmail.com> writes: > > On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford > > <richard.sandif...@linaro.org> wrote: > >> Richard Biener <richard.guent...@gmail.com> writes: > >>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford > >>> <richard.sandif...@linaro.org> wrote: > >>>> The SLP unrolling factor is calculated by finding the smallest > >>>> scalar type for each SLP statement and taking the number of required > >>>> lanes from the vector versions of those scalar types. E.g. for an > >>>> int32->int64 conversion, it's the vector of int32s rather than the > >>>> vector of int64s that determines the unroll factor. > >>>> > >>>> We rely on tree-vect-patterns.c to replace boolean operations like: > >>>> > >>>> bool a, b, c; > >>>> a = b & c; > >>>> > >>>> with integer operations of whatever the best size is in context. > >>>> E.g. if b and c are fed by comparisons of ints, a, b and c will become > >>>> the appropriate size for an int comparison. For most targets this means > >>>> that a, b and c will end up as int-sized themselves, but on targets like > >>>> SVE and AVX512 with packed vector booleans, they'll instead become a > >>>> small bitfield like :1, padded to a byte for memory purposes. > >>>> The SLP code would then take these scalar types and try to calculate > >>>> the vector type for them, causing the unroll factor to be much higher > >>>> than necessary. > >>>> > >>>> This patch makes SLP use the cached vector boolean type if that's > >>>> appropriate. Tested on aarch64-linux-gnu (with and without SVE), > >>>> aarch64_be-none-elf and x86_64-linux-gnu. OK to install? > >>>> > >>>> Richard > >>>> > >>>> > >>>> 2018-05-09 Richard Sandiford <richard.sandif...@linaro.org> > >>>> > >>>> gcc/ > >>>> * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function. > >>>> (vect_build_slp_tree_1): Use it when calculating the unroll factor. > >>>> > >>>> gcc/testsuite/ > >>>> * gcc.target/aarch64/sve/vcond_10.c: New test. > >>>> * gcc.target/aarch64/sve/vcond_10_run.c: Likewise. > >>>> * gcc.target/aarch64/sve/vcond_11.c: Likewise. > >>>> * gcc.target/aarch64/sve/vcond_11_run.c: Likewise. > >>>> > >>>> Index: gcc/tree-vect-slp.c > >>>> =================================================================== > >>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100 > >>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100 > >>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo, > >>>> return true; > >>>> } > >>>> > >>>> +/* Return the vector type associated with the smallest scalar type in STMT. */ > >>>> + > >>>> +static tree > >>>> +get_vectype_for_smallest_scalar_type (gimple *stmt) > >>>> +{ > >>>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > >>>> + tree vectype = STMT_VINFO_VECTYPE (stmt_info); > >>>> + if (vectype != NULL_TREE > >>>> + && VECTOR_BOOLEAN_TYPE_P (vectype)) > >>> > >>> Hum. At this point you can't really rely on vector types being set... > >> > >> Not for everything, but here we only care about the result of the > >> pattern replacements, and pattern replacements do set the vector type > >> up-front. vect_determine_vectorization_factor (which runs earlier > >> for loop vectorisation) also relies on this. > >> > >>>> + { > >>>> + /* The result of a vector boolean operation has the smallest scalar > >>>> + type unless the statement is extending an even narrower boolean. */ > >>>> + if (!gimple_assign_cast_p (stmt)) > >>>> + return vectype; > >>>> + > >>>> + tree src = gimple_assign_rhs1 (stmt); > >>>> + gimple *def_stmt; > >>>> + enum vect_def_type dt; > >>>> + tree src_vectype = NULL_TREE; > >>>> + if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt, > >>>> + &src_vectype) > >>>> + && src_vectype > >>>> + && VECTOR_BOOLEAN_TYPE_P (src_vectype)) > >>>> + { > >>>> + if (TYPE_PRECISION (TREE_TYPE (src_vectype)) > >>>> + < TYPE_PRECISION (TREE_TYPE (vectype))) > >>>> + return src_vectype; > >>>> + return vectype; > >>>> + } > >>>> + } > >>>> + HOST_WIDE_INT dummy; > >>>> + tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); > >>>> + return get_vectype_for_scalar_type (scalar_type); > >>>> +} > >>>> + > >>>> /* Verify if the scalar stmts STMTS are isomorphic, require data > >>>> permutation or are of unsupported types of operation. Return > >>>> true if they are, otherwise return false and indicate in *MATCHES > >>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, > >>>> enum tree_code first_cond_code = ERROR_MARK; > >>>> tree lhs; > >>>> bool need_same_oprnds = false; > >>>> - tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE; > >>>> + tree vectype = NULL_TREE, first_op1 = NULL_TREE; > >>>> optab optab; > >>>> int icode; > >>>> machine_mode optab_op2_mode; > >>>> machine_mode vec_mode; > >>>> - HOST_WIDE_INT dummy; > >>>> gimple *first_load = NULL, *prev_first_load = NULL; > >>>> > >>>> /* For every stmt in NODE find its def stmt/s. */ > >>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo, > >>>> return false; > >>>> } > >>>> > >>>> - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); > >>> > >>> ... so I wonder how this goes wrong here. > >> > >> It picks the right scalar type, but then we go on to use > >> get_vectype_for_scalar_type when get_mask_type_for_scalar_type > >> is what we actually want. The easiest fix for that seemed to use > >> the vectype that had already been calculated (also as for > >> vect_determine_vectorization_factor). > >> > >>> I suppose we want to ignore vector booleans for the purpose of max_nunits > >>> computation. So isn't a better fix to simply "ignore" those in > >>> vect_get_smallest_scalar_type instead? I see that for intermediate > >>> full-boolean operations like > >>> > >>> a = x[i] < 0; > >>> b = y[i] > 0; > >>> tem = a & b; > >>> > >>> we want to ignore 'tem = a & b' fully here for the purpose of > >>> vect_record_max_nunits. So if scalar_type is a bitfield type > >>> then skip it? > >> > >> Bitfield types will always be the smallest scalar type if they're > >> present, so I think in pathological cases this could make us > >> incorrectly ignore source operands to a compare. > >> > >> If we're confident that compares and casts of VECT_SCALAR_BOOLEAN_TYPE_Ps > >> never affect the VF or UF then we should probably skip them based on > >> that rather than whether the scalar type is a bitfield, so that the > >> behaviour is the same for all targets. It seems a bit dangerous though... > > > > Well, all stmts that have no inherent promotion / demotion have no > > effect on the VF > > if you also have loads / stores. > > > > One reason I dislike the current way of computing vector types and vectorization > > factor is that it tries to do that ad-hoc from looking at stmts > > locally instead of > > somehow propagating things from sources to sinks -- which would be a requirement > > if we ever drop the requirement of same-sized vector types throughout > > vectorization... > Yeah. This patch was just supposed to be a point improvement rather > than perfection. > > In fact I wonder if we can get away with recording max_nunits here and delay > > SLP_INSTANCE_UNROLLING_FACTOR computation until we compute the actual vector > > types. I think the code is most useful for BB vectorization where we > > need to terminate > > the SLP when we get to stmts we cannot handle without "unrolling" > > (given the vector > > size constraint). > Part of the problem is that vect_build_slp_tree_1 also uses the vector > type to choose between shifts by vectors and shifts by scalars, and to > test whether two-operand permutes are valid. So as things stand I think > we do need to know the vector type at some level here, even though those > two cases aren't interesting for booleans. > > Anyhow - I probably dislike your patch most because you add another > > get_vectype_for_smallest_scalar_type helper which looks like a hack to me... > > > > How is this issue solved for the non-SLP case? I do remember that function > > computing the VF and/or vector types is quite a mess with vector booleans... > OK, for the purposes of fixing this bug, would it be OK to split out > the code in vect_determine_vectorization_factor that computes the > vector types and reuse it in SLP, even though I don't think either > of us like the way it's done? At least that way there's only one > place to change in future. > This patch does that. I tweaked a couple of the comments and > added a couple more dump lines, but otherwise the code in > vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt > is the same as the original. > Tested as before. Much better - thanks for doing it. OK for trunk and sorry again for the delay... Richard. > Thanks, > Richard > 2018-05-10 Richard Sandiford <richard.sandif...@linaro.org> > gcc/ > * tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare. > (vect_get_mask_type_for_stmt): Likewise. > * tree-vect-slp.c (vect_two_operations_perm_ok_p): New function, > split out from... > (vect_build_slp_tree_1): ...here. Use vect_get_vector_types_for_stmt > to determine the statement's vector type and the vector type that > should be used for calculating nunits. Deal with cases in which > the type has to be deferred. > (vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt > and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE. > * tree-vect-loop.c (vect_determine_vf_for_stmt_1) > (vect_determine_vf_for_stmt): New functions, split out from... > (vect_determine_vectorization_factor): ...here. > * tree-vect-stmts.c (vect_get_vector_types_for_stmt) > (vect_get_mask_type_for_stmt): New functions, split out from > vect_determine_vectorization_factor. > gcc/testsuite/ > * gcc.target/aarch64/sve/vcond_10.c: New test. > * gcc.target/aarch64/sve/vcond_10_run.c: Likewise. > * gcc.target/aarch64/sve/vcond_11.c: Likewise. > * gcc.target/aarch64/sve/vcond_11_run.c: Likewise. > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2018-05-10 07:18:12.104514856 +0100 > +++ gcc/tree-vectorizer.h 2018-05-10 07:18:12.322505512 +0100 > @@ -1467,6 +1467,8 @@ extern tree vect_gen_perm_mask_checked ( > extern void optimize_mask_stores (struct loop*); > extern gcall *vect_gen_while (tree, tree, tree); > extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree); > +extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree *); > +extern tree vect_get_mask_type_for_stmt (stmt_vec_info); > /* In tree-vect-data-refs.c. */ > extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int); > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2018-05-10 07:18:12.104514856 +0100 > +++ gcc/tree-vect-slp.c 2018-05-10 07:18:12.321505555 +0100 > @@ -608,6 +608,33 @@ vect_record_max_nunits (vec_info *vinfo, > return true; > } > +/* STMTS is a group of GROUP_SIZE SLP statements in which some > + statements do the same operation as the first statement and in which > + the others do ALT_STMT_CODE. Return true if we can take one vector > + of the first operation and one vector of the second and permute them > + to get the required result. VECTYPE is the type of the vector that > + would be permuted. */ > + > +static bool > +vect_two_operations_perm_ok_p (vec<gimple *> stmts, unsigned int group_size, > + tree vectype, tree_code alt_stmt_code) > +{ > + unsigned HOST_WIDE_INT count; > + if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&count)) > + return false; > + > + vec_perm_builder sel (count, count, 1); > + for (unsigned int i = 0; i < count; ++i) > + { > + unsigned int elt = i; > + if (gimple_assign_rhs_code (stmts[i % group_size]) == alt_stmt_code) > + elt += count; > + sel.quick_push (elt); > + } > + vec_perm_indices indices (sel, 2, count); > + return can_vec_perm_const_p (TYPE_MODE (vectype), indices); > +} > + > /* Verify if the scalar stmts STMTS are isomorphic, require data > permutation or are of unsupported types of operation. Return > true if they are, otherwise return false and indicate in *MATCHES > @@ -636,17 +663,17 @@ vect_build_slp_tree_1 (vec_info *vinfo, > enum tree_code first_cond_code = ERROR_MARK; > tree lhs; > bool need_same_oprnds = false; > - tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE; > + tree vectype = NULL_TREE, first_op1 = NULL_TREE; > optab optab; > int icode; > machine_mode optab_op2_mode; > machine_mode vec_mode; > - HOST_WIDE_INT dummy; > gimple *first_load = NULL, *prev_first_load = NULL; > /* For every stmt in NODE find its def stmt/s. */ > FOR_EACH_VEC_ELT (stmts, i, stmt) > { > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > swap[i] = 0; > matches[i] = false; > @@ -685,15 +712,19 @@ vect_build_slp_tree_1 (vec_info *vinfo, > return false; > } > - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); > - vectype = get_vectype_for_scalar_type (scalar_type); > - if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype, > - max_nunits)) > + tree nunits_vectype; > + if (!vect_get_vector_types_for_stmt (stmt_info, &vectype, > + &nunits_vectype) > + || (nunits_vectype > + && !vect_record_max_nunits (vinfo, stmt, group_size, > + nunits_vectype, max_nunits))) > { > /* Fatal mismatch. */ > matches[0] = false; > - return false; > - } > + return false; > + } > + > + gcc_assert (vectype); > if (gcall *call_stmt = dyn_cast <gcall *> (stmt)) > { > @@ -730,6 +761,17 @@ vect_build_slp_tree_1 (vec_info *vinfo, > || rhs_code == LROTATE_EXPR > || rhs_code == RROTATE_EXPR) > { > + if (vectype == boolean_type_node) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "Build SLP failed: shift of a" > + " boolean.\n"); > + /* Fatal mismatch. */ > + matches[0] = false; > + return false; > + } > + > vec_mode = TYPE_MODE (vectype); > /* First see if we have a vector/vector shift. */ > @@ -973,29 +1015,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, > /* If we allowed a two-operation SLP node verify the target can cope > with the permute we are going to use. */ > - poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype); > if (alt_stmt_code != ERROR_MARK > && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) > { > - unsigned HOST_WIDE_INT count; > - if (!nunits.is_constant (&count)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "Build SLP failed: different operations " > - "not allowed with variable-length SLP.\n"); > - return false; > - } > - vec_perm_builder sel (count, count, 1); > - for (i = 0; i < count; ++i) > - { > - unsigned int elt = i; > - if (gimple_assign_rhs_code (stmts[i % group_size]) == alt_stmt_code) > - elt += count; > - sel.quick_push (elt); > - } > - vec_perm_indices indices (sel, 2, count); > - if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) > + if (vectype == boolean_type_node > + || !vect_two_operations_perm_ok_p (stmts, group_size, > + vectype, alt_stmt_code)) > { > for (i = 0; i < group_size; ++i) > if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) > @@ -2759,36 +2784,18 @@ vect_slp_analyze_node_operations (vec_in > if (bb_vinfo > && ! STMT_VINFO_DATA_REF (stmt_info)) > { > - gcc_assert (PURE_SLP_STMT (stmt_info)); > - > - tree scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "get vectype for scalar type: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); > - dump_printf (MSG_NOTE, "\n"); > - } > - > - tree vectype = get_vectype_for_scalar_type (scalar_type); > - if (!vectype) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not SLPed: unsupported data-type "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - scalar_type); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, "vectype: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype); > - dump_printf (MSG_NOTE, "\n"); > + tree vectype, nunits_vectype; > + if (!vect_get_vector_types_for_stmt (stmt_info, &vectype, > + &nunits_vectype)) > + /* We checked this when building the node. */ > + gcc_unreachable (); > + if (vectype == boolean_type_node) > + { > + vectype = vect_get_mask_type_for_stmt (stmt_info); > + if (!vectype) > + /* vect_get_mask_type_for_stmt has already explained the > + failure. */ > + return false; > } > gimple *sstmt; > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2018-05-10 07:18:12.104514856 +0100 > +++ gcc/tree-vect-loop.c 2018-05-10 07:18:12.320505598 +0100 > @@ -155,6 +155,108 @@ Software Foundation; either version 3, o > static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *); > +/* Subroutine of vect_determine_vf_for_stmt that handles only one > + statement. VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE > + may already be set for general statements (not just data refs). */ > + > +static bool > +vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info, > + bool vectype_maybe_set_p, > + poly_uint64 *vf, > + vec<stmt_vec_info > *mask_producers) > +{ > + gimple *stmt = stmt_info->stmt; > + > + if ((!STMT_VINFO_RELEVANT_P (stmt_info) > + && !STMT_VINFO_LIVE_P (stmt_info)) > + || gimple_clobber_p (stmt)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "skip.\n"); > + return true; > + } > + > + tree stmt_vectype, nunits_vectype; > + if (!vect_get_vector_types_for_stmt (stmt_info, &stmt_vectype, > + &nunits_vectype)) > + return false; > + > + if (stmt_vectype) > + { > + if (STMT_VINFO_VECTYPE (stmt_info)) > + /* The only case when a vectype had been already set is for stmts > + that contain a data ref, or for "pattern-stmts" (stmts generated > + by the vectorizer to represent/replace a certain idiom). */ > + gcc_assert ((STMT_VINFO_DATA_REF (stmt_info) > + || vectype_maybe_set_p) > + && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype); > + else if (stmt_vectype == boolean_type_node) > + mask_producers->safe_push (stmt_info); > + else > + STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype; > + } > + > + if (nunits_vectype) > + vect_update_max_nunits (vf, nunits_vectype); > + > + return true; > +} > + > +/* Subroutine of vect_determine_vectorization_factor. Set the vector > + types of STMT_INFO and all attached pattern statements and update > + the vectorization factor VF accordingly. If some of the statements > + produce a mask result whose vector type can only be calculated later, > + add them to MASK_PRODUCERS. Return true on success or false if > + something prevented vectorization. */ > + > +static bool > +vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf, > + vec<stmt_vec_info > *mask_producers) > +{ > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: "); > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0); > + } > + if (!vect_determine_vf_for_stmt_1 (stmt_info, false, vf, mask_producers)) > + return false; > + > + if (STMT_VINFO_IN_PATTERN_P (stmt_info) > + && STMT_VINFO_RELATED_STMT (stmt_info)) > + { > + stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info)); > + > + /* If a pattern statement has def stmts, analyze them too. */ > + gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info); > + for (gimple_stmt_iterator si = gsi_start (pattern_def_seq); > + !gsi_end_p (si); gsi_next (&si)) > + { > + stmt_vec_info def_stmt_info = vinfo_for_stmt (gsi_stmt (si)); > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "==> examining pattern def stmt: "); > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, > + def_stmt_info->stmt, 0); > + } > + if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true, > + vf, mask_producers)) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "==> examining pattern statement: "); > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0); > + } > + if (!vect_determine_vf_for_stmt_1 (stmt_info, true, vf, mask_producers)) > + return false; > + } > + > + return true; > +} > + > /* Function vect_determine_vectorization_factor > Determine the vectorization factor (VF). VF is the number of data elements > @@ -192,12 +294,6 @@ vect_determine_vectorization_factor (loo > tree vectype; > stmt_vec_info stmt_info; > unsigned i; > - HOST_WIDE_INT dummy; > - gimple *stmt, *pattern_stmt = NULL; > - gimple_seq pattern_def_seq = NULL; > - gimple_stmt_iterator pattern_def_si = gsi_none (); > - bool analyze_pattern_stmt = false; > - bool bool_result; > auto_vec<stmt_vec_info> mask_producers; > if (dump_enabled_p ()) > @@ -269,304 +365,13 @@ vect_determine_vectorization_factor (loo > } > } > - for (gimple_stmt_iterator si = gsi_start_bb (bb); > - !gsi_end_p (si) || analyze_pattern_stmt;) > - { > - tree vf_vectype; > - > - if (analyze_pattern_stmt) > - stmt = pattern_stmt; > - else > - stmt = gsi_stmt (si); > - > - stmt_info = vinfo_for_stmt (stmt); > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "==> examining statement: "); > - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > - } > - > - gcc_assert (stmt_info); > - > - /* Skip stmts which do not need to be vectorized. */ > - if ((!STMT_VINFO_RELEVANT_P (stmt_info) > - && !STMT_VINFO_LIVE_P (stmt_info)) > - || gimple_clobber_p (stmt)) > - { > - if (STMT_VINFO_IN_PATTERN_P (stmt_info) > - && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info)) > - && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt)) > - || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt)))) > - { > - stmt = pattern_stmt; > - stmt_info = vinfo_for_stmt (pattern_stmt); > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "==> examining pattern statement: "); > - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); > - } > - } > - else > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_NOTE, vect_location, "skip.\n"); > - gsi_next (&si); > - continue; > - } > - } > - else if (STMT_VINFO_IN_PATTERN_P (stmt_info) > - && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info)) > - && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt)) > - || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt)))) > - analyze_pattern_stmt = true; > - > - /* If a pattern statement has def stmts, analyze them too. */ > - if (is_pattern_stmt_p (stmt_info)) > - { > - if (pattern_def_seq == NULL) > - { > - pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info); > - pattern_def_si = gsi_start (pattern_def_seq); > - } > - else if (!gsi_end_p (pattern_def_si)) > - gsi_next (&pattern_def_si); > - if (pattern_def_seq != NULL) > - { > - gimple *pattern_def_stmt = NULL; > - stmt_vec_info pattern_def_stmt_info = NULL; > - > - while (!gsi_end_p (pattern_def_si)) > - { > - pattern_def_stmt = gsi_stmt (pattern_def_si); > - pattern_def_stmt_info > - = vinfo_for_stmt (pattern_def_stmt); > - if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info) > - || STMT_VINFO_LIVE_P (pattern_def_stmt_info)) > - break; > - gsi_next (&pattern_def_si); > - } > - > - if (!gsi_end_p (pattern_def_si)) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "==> examining pattern def stmt: "); > - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, > - pattern_def_stmt, 0); > - } > - > - stmt = pattern_def_stmt; > - stmt_info = pattern_def_stmt_info; > - } > - else > - { > - pattern_def_si = gsi_none (); > - analyze_pattern_stmt = false; > - } > - } > - else > - analyze_pattern_stmt = false; > - } > - > - if (gimple_get_lhs (stmt) == NULL_TREE > - /* MASK_STORE has no lhs, but is ok. */ > - && (!is_gimple_call (stmt) > - || !gimple_call_internal_p (stmt) > - || gimple_call_internal_fn (stmt) != IFN_MASK_STORE)) > - { > - if (is_gimple_call (stmt)) > - { > - /* Ignore calls with no lhs. These must be calls to > - #pragma omp simd functions, and what vectorization factor > - it really needs can't be determined until > - vectorizable_simd_clone_call. */ > - if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) > - { > - pattern_def_seq = NULL; > - gsi_next (&si); > - } > - continue; > - } > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: irregular stmt."); > - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, > - 0); > - } > - return false; > - } > - > - if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt)))) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: vector stmt in loop:"); > - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > - } > - return false; > - } > - > - bool_result = false; > - > - if (STMT_VINFO_VECTYPE (stmt_info)) > - { > - /* The only case when a vectype had been already set is for stmts > - that contain a dataref, or for "pattern-stmts" (stmts > - generated by the vectorizer to represent/replace a certain > - idiom). */ > - gcc_assert (STMT_VINFO_DATA_REF (stmt_info) > - || is_pattern_stmt_p (stmt_info) > - || !gsi_end_p (pattern_def_si)); > - vectype = STMT_VINFO_VECTYPE (stmt_info); > - } > - else > - { > - gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); > - if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) > - scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); > - else > - scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > - > - /* Bool ops don't participate in vectorization factor > - computation. For comparison use compared types to > - compute a factor. */ > - if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) > - && is_gimple_assign (stmt) > - && gimple_assign_rhs_code (stmt) != COND_EXPR) > - { > - if (STMT_VINFO_RELEVANT_P (stmt_info) > - || STMT_VINFO_LIVE_P (stmt_info)) > - mask_producers.safe_push (stmt_info); > - bool_result = true; > - > - if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) > - == tcc_comparison > - && !VECT_SCALAR_BOOLEAN_TYPE_P > - (TREE_TYPE (gimple_assign_rhs1 (stmt)))) > - scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); > - else > - { > - if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) > - { > - pattern_def_seq = NULL; > - gsi_next (&si); > - } > - continue; > - } > - } > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "get vectype for scalar type: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); > - dump_printf (MSG_NOTE, "\n"); > - } > - vectype = get_vectype_for_scalar_type (scalar_type); > - if (!vectype) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: unsupported " > - "data-type "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - scalar_type); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - > - if (!bool_result) > - STMT_VINFO_VECTYPE (stmt_info) = vectype; > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, "vectype: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype); > - dump_printf (MSG_NOTE, "\n"); > - } > - } > - > - /* Don't try to compute VF out scalar types if we stmt > - produces boolean vector. Use result vectype instead. */ > - if (VECTOR_BOOLEAN_TYPE_P (vectype)) > - vf_vectype = vectype; > - else > - { > - /* The vectorization factor is according to the smallest > - scalar type (or the largest vector size, but we only > - support one vector size per loop). */ > - if (!bool_result) > - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, > - &dummy); > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, > - "get vectype for scalar type: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); > - dump_printf (MSG_NOTE, "\n"); > - } > - vf_vectype = get_vectype_for_scalar_type (scalar_type); > - } > - if (!vf_vectype) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: unsupported data-type "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - scalar_type); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - > - if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)), > - GET_MODE_SIZE (TYPE_MODE (vf_vectype)))) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: different sized vector " > - "types in statement, "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - vectype); > - dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - vf_vectype); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, "vectype: "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, vf_vectype); > - dump_printf (MSG_NOTE, "\n"); > - } > - > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_NOTE, vect_location, "nunits = "); > - dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype)); > - dump_printf (MSG_NOTE, "\n"); > - } > - > - vect_update_max_nunits (&vectorization_factor, vf_vectype); > - > - if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) > - { > - pattern_def_seq = NULL; > - gsi_next (&si); > - } > + for (gimple_stmt_iterator si = gsi_start_bb (bb); !gsi_end_p (si); > + gsi_next (&si)) > + { > + stmt_info = vinfo_for_stmt (gsi_stmt (si)); > + if (!vect_determine_vf_for_stmt (stmt_info, &vectorization_factor, > + &mask_producers)) > + return false; > } > } > @@ -589,119 +394,11 @@ vect_determine_vectorization_factor (loo > for (i = 0; i < mask_producers.length (); i++) > { > - tree mask_type = NULL; > - > - stmt = STMT_VINFO_STMT (mask_producers[i]); > - > - if (is_gimple_assign (stmt) > - && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison > - && !VECT_SCALAR_BOOLEAN_TYPE_P > - (TREE_TYPE (gimple_assign_rhs1 (stmt)))) > - { > - scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); > - mask_type = get_mask_type_for_scalar_type (scalar_type); > - > - if (!mask_type) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: unsupported mask\n"); > - return false; > - } > - } > - else > - { > - tree rhs; > - ssa_op_iter iter; > - gimple *def_stmt; > - enum vect_def_type dt; > - > - FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE) > - { > - if (!vect_is_simple_use (rhs, mask_producers[i]->vinfo, > - &def_stmt, &dt, &vectype)) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: can't compute mask type " > - "for statement, "); > - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, > - 0); > - } > - return false; > - } > - > - /* No vectype probably means external definition. > - Allow it in case there is another operand which > - allows to determine mask type. */ > - if (!vectype) > - continue; > - > - if (!mask_type) > - mask_type = vectype; > - else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type), > - TYPE_VECTOR_SUBPARTS (vectype))) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: different sized masks " > - "types in statement, "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - mask_type); > - dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - vectype); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - else if (VECTOR_BOOLEAN_TYPE_P (mask_type) > - != VECTOR_BOOLEAN_TYPE_P (vectype)) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: mixed mask and " > - "nonmask vector types in statement, "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - mask_type); > - dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > - dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > - vectype); > - dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > - } > - return false; > - } > - } > - > - /* We may compare boolean value loaded as vector of integers. > - Fix mask_type in such case. */ > - if (mask_type > - && !VECTOR_BOOLEAN_TYPE_P (mask_type) > - && gimple_code (stmt) == GIMPLE_ASSIGN > - && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison) > - mask_type = build_same_sized_truth_vector_type (mask_type); > - } > - > - /* No mask_type should mean loop invariant predicate. > - This is probably a subject for optimization in > - if-conversion. */ > + stmt_info = mask_producers[i]; > + tree mask_type = vect_get_mask_type_for_stmt (stmt_info); > if (!mask_type) > - { > - if (dump_enabled_p ()) > - { > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: can't compute mask type " > - "for statement, "); > - dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, > - 0); > - } > - return false; > - } > - > - STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type; > + return false; > + STMT_VINFO_VECTYPE (stmt_info) = mask_type; > } > return true; > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2018-05-10 07:18:12.104514856 +0100 > +++ gcc/tree-vect-stmts.c 2018-05-10 07:18:12.322505512 +0100 > @@ -10520,3 +10520,311 @@ vect_gen_while_not (gimple_seq *seq, tre > gimple_seq_add_stmt (seq, call); > return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp); > } > + > +/* Try to compute the vector types required to vectorize STMT_INFO, > + returning true on success and false if vectorization isn't possible. > + > + On success: > + > + - Set *STMT_VECTYPE_OUT to: > + - NULL_TREE if the statement doesn't need to be vectorized; > + - boolean_type_node if the statement is a boolean operation whose > + vector type can only be determined once all the other vector types > + are known; and > + - the equivalent of STMT_VINFO_VECTYPE otherwise. > + > + - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum > + number of units needed to vectorize STMT_INFO, or NULL_TREE if the > + statement does not help to determine the overall number of units. */ > + > +bool > +vect_get_vector_types_for_stmt (stmt_vec_info stmt_info, > + tree *stmt_vectype_out, > + tree *nunits_vectype_out) > +{ > + gimple *stmt = stmt_info->stmt; > + > + *stmt_vectype_out = NULL_TREE; > + *nunits_vectype_out = NULL_TREE; > + > + if (gimple_get_lhs (stmt) == NULL_TREE > + /* MASK_STORE has no lhs, but is ok. */ > + && !gimple_call_internal_p (stmt, IFN_MASK_STORE)) > + { > + if (is_a <gcall *> (stmt)) > + { > + /* Ignore calls with no lhs. These must be calls to > + #pragma omp simd functions, and what vectorization factor > + it really needs can't be determined until > + vectorizable_simd_clone_call. */ > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "defer to SIMD clone analysis.\n"); > + return true; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: irregular stmt."); > + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > + } > + return false; > + } > + > + if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt)))) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: vector stmt in loop:"); > + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > + } > + return false; > + } > + > + tree vectype; > + tree scalar_type = NULL_TREE; > + if (STMT_VINFO_VECTYPE (stmt_info)) > + *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info); > + else > + { > + gcc_assert (!STMT_VINFO_DATA_REF (stmt_info)); > + if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) > + scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); > + else > + scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > + > + /* Pure bool ops don't participate in number-of-units computation. > + For comparisons use the types being compared. */ > + if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) > + && is_gimple_assign (stmt) > + && gimple_assign_rhs_code (stmt) != COND_EXPR) > + { > + *stmt_vectype_out = boolean_type_node; > + > + tree rhs1 = gimple_assign_rhs1 (stmt); > + if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison > + && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))) > + scalar_type = TREE_TYPE (rhs1); > + else > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, > + "pure bool operation.\n"); > + return true; > + } > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "get vectype for scalar type: "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); > + dump_printf (MSG_NOTE, "\n"); > + } > + vectype = get_vectype_for_scalar_type (scalar_type); > + if (!vectype) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: unsupported data-type "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > + scalar_type); > + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > + } > + return false; > + } > + > + if (!*stmt_vectype_out) > + *stmt_vectype_out = vectype; > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, "vectype: "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype); > + dump_printf (MSG_NOTE, "\n"); > + } > + } > + > + /* Don't try to compute scalar types if the stmt produces a boolean > + vector; use the existing vector type instead. */ > + tree nunits_vectype; > + if (VECTOR_BOOLEAN_TYPE_P (vectype)) > + nunits_vectype = vectype; > + else > + { > + /* The number of units is set according to the smallest scalar > + type (or the largest vector size, but we only support one > + vector size per vectorization). */ > + if (*stmt_vectype_out != boolean_type_node) > + { > + HOST_WIDE_INT dummy; > + scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); > + } > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "get vectype for scalar type: "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type); > + dump_printf (MSG_NOTE, "\n"); > + } > + nunits_vectype = get_vectype_for_scalar_type (scalar_type); > + } > + if (!nunits_vectype) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: unsupported data-type "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, scalar_type); > + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > + } > + return false; > + } > + > + if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)), > + GET_MODE_SIZE (TYPE_MODE (nunits_vectype)))) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: different sized vector " > + "types in statement, "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype); > + dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, nunits_vectype); > + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > + } > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, "vectype: "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, nunits_vectype); > + dump_printf (MSG_NOTE, "\n"); > + > + dump_printf_loc (MSG_NOTE, vect_location, "nunits = "); > + dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype)); > + dump_printf (MSG_NOTE, "\n"); > + } > + > + *nunits_vectype_out = nunits_vectype; > + return true; > +} > + > +/* Try to determine the correct vector type for STMT_INFO, which is a > + statement that produces a scalar boolean result. Return the vector > + type on success, otherwise return NULL_TREE. */ > + > +tree > +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info) > +{ > + gimple *stmt = stmt_info->stmt; > + tree mask_type = NULL; > + tree vectype, scalar_type; > + > + if (is_gimple_assign (stmt) > + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison > + && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt)))) > + { > + scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); > + mask_type = get_mask_type_for_scalar_type (scalar_type); > + > + if (!mask_type) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: unsupported mask\n"); > + return NULL_TREE; > + } > + } > + else > + { > + tree rhs; > + ssa_op_iter iter; > + gimple *def_stmt; > + enum vect_def_type dt; > + > + FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE) > + { > + if (!vect_is_simple_use (rhs, stmt_info->vinfo, > + &def_stmt, &dt, &vectype)) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: can't compute mask type " > + "for statement, "); > + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, > + 0); > + } > + return NULL_TREE; > + } > + > + /* No vectype probably means external definition. > + Allow it in case there is another operand which > + allows to determine mask type. */ > + if (!vectype) > + continue; > + > + if (!mask_type) > + mask_type = vectype; > + else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type), > + TYPE_VECTOR_SUBPARTS (vectype))) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: different sized masks " > + "types in statement, "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > + mask_type); > + dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > + vectype); > + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > + } > + return NULL_TREE; > + } > + else if (VECTOR_BOOLEAN_TYPE_P (mask_type) > + != VECTOR_BOOLEAN_TYPE_P (vectype)) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: mixed mask and " > + "nonmask vector types in statement, "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > + mask_type); > + dump_printf (MSG_MISSED_OPTIMIZATION, " and "); > + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, > + vectype); > + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); > + } > + return NULL_TREE; > + } > + } > + > + /* We may compare boolean value loaded as vector of integers. > + Fix mask_type in such case. */ > + if (mask_type > + && !VECTOR_BOOLEAN_TYPE_P (mask_type) > + && gimple_code (stmt) == GIMPLE_ASSIGN > + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison) > + mask_type = build_same_sized_truth_vector_type (mask_type); > + } > + > + /* No mask_type should mean loop invariant predicate. > + This is probably a subject for optimization in if-conversion. */ > + if (!mask_type && dump_enabled_p ()) > + { > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "not vectorized: can't compute mask type " > + "for statement, "); > + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > + } > + return mask_type; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c 2018-05-10 07:18:12.317505726 +0100 > @@ -0,0 +1,36 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ > + > +#include <stdint.h> > + > +#define DEF_LOOP(TYPE) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n) \ > + { \ > + for (int i = 0; i < n; i += 2) \ > + { \ > + a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2; \ > + a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4; \ > + } \ > + } > + > +#define FOR_EACH_TYPE(T) \ > + T (int8_t) \ > + T (uint8_t) \ > + T (int16_t) \ > + T (uint16_t) \ > + T (int32_t) \ > + T (uint32_t) \ > + T (int64_t) \ > + T (uint64_t) \ > + T (_Float16) \ > + T (float) \ > + T (double) > + > +FOR_EACH_TYPE (DEF_LOOP) > + > +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */ > +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */ > +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */ > +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */ > +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-10 07:18:12.317505726 +0100 > @@ -0,0 +1,24 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ > + > +#include "vcond_10.c" > + > +#define N 133 > + > +#define TEST_LOOP(TYPE) \ > + { \ > + TYPE a[N]; \ > + for (int i = 0; i < N; ++i) \ > + a[i] = i % 7; \ > + test_##TYPE (a, 10, 11, 12, 13, N); \ > + for (int i = 0; i < N; ++i) \ > + if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3)) \ > + __builtin_abort (); \ > + } > + > +int > +main (void) > +{ > + FOR_EACH_TYPE (TEST_LOOP); > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c 2018-05-10 07:18:12.317505726 +0100 > @@ -0,0 +1,36 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ > + > +#include <stdint.h> > + > +#define DEF_LOOP(TYPE) \ > + void __att