On Fri, Nov 29, 2019 at 11:16 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Now that stmt_vec_info records the choice between vector mask > types and normal nonmask types, we can use that information in > vect_get_vector_types_for_stmt instead of deferring the choice > of vector type till later. > > vect_get_mask_type_for_stmt used to check whether the boolean inputs > to an operation: > (a) consistently used mask types or consistently used nonmask types; and > (b) agreed on the number of elements. > > (b) shouldn't be a problem when (a) is met. If the operation > consistently uses mask types, tree-vect-patterns.c will have corrected > any mismatches in mask precision. (This is because we only use mask > types for a small well-known set of operations and tree-vect-patterns.c > knows how to handle any that could have different mask precisions.) > And if the operation consistently uses normal nonmask types, there's > no reason why booleans should need extra vector compatibility checks > compared to ordinary integers. > > So the potential difficulties all seem to come from (a). Now that > we've chosen the result type ahead of time, we also have to consider > whether the outputs and inputs consistently use mask types. > > Taking each vectorizable_* routine in turn: > > - vectorizable_call > > vect_get_vector_types_for_stmt only handled booleans specially > for gassigns, so vect_get_mask_type_for_stmt never had chance to > handle calls. I'm not sure we support any calls that operate on > booleans, but as things stand, a boolean result would always have > a nonmask type. Presumably any vector argument would also need to > use nonmask types, unless it corresponds to internal_fn_mask_index > (which is already a special case). > > For safety, I've added a check for mask/nonmask combinations here > even though we didn't check this previously. > > - vectorizable_simd_clone_call > > Again, vect_get_mask_type_for_stmt never had chance to handle calls. > The result of the call will always be a nonmask type and the patch > for PR 92710 rejects mask arguments. So all booleans should > consistently use nonmask types here. > > - vectorizable_conversion > > The function already rejects any conversion between booleans in which > one type isn't a mask type. > > - vectorizable_operation > > This function definitely needs a consistency check, e.g. to handle > & and | in which one operand is loaded from memory and the other is > a comparison result. Ideally we'd handle this via pattern stmts > instead (like we do for the all-mask case), but that's future work. > > - vectorizable_assignment > > VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the > current code already rejects problematic cases. > > - vectorizable_load > > Loads always produce nonmask types and there are no relevant inputs > to check against. > > - vectorizable_store > > vect_check_store_rhs already rejects mask/nonmask combinations > via useless_type_conversion_p. > > - vectorizable_reduction > - vectorizable_lc_phi > > PHIs always have nonmask types. After the change above, attempts > to combine the PHI result with a mask type would be rejected by > vectorizable_operation. (Again, it would be better to handle > this using pattern stmts.) > > - vectorizable_induction > > We don't generate inductions for booleans. > > - vectorizable_shift > > The function already rejects boolean shifts via type_has_mode_precision_p. > > - vectorizable_condition > > The function already rejects mismatches via useless_type_conversion_p. > > - vectorizable_comparison > > The function already rejects comparisons between mask and nonmask types. > The result is always a mask type.
OK. Thanks for cleaning this up! Richard. > > 2019-11-29 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > PR tree-optimization/92596 > * tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask > operations. > (vectorizable_operation): Likewise, instead of relying on > vect_get_mask_type_for_stmt to do this. > (vect_get_vector_types_for_stmt): Always return a vector type > immediately, rather than deferring the choice for boolean results. > Use a vector mask type instead of a normal vector if > vect_use_mask_type_p. > (vect_get_mask_type_for_stmt): Delete. > * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove > mask_producers argument and special boolean_type_node handling. > (vect_determine_vf_for_stmt): Remove mask_producers argument and > update calls to vect_determine_vf_for_stmt_1. Remove doubled call. > (vect_determine_vectorization_factor): Update call accordingly. > * tree-vect-slp.c (vect_build_slp_tree_1): Remove special > boolean_type_node handling. > (vect_slp_analyze_node_operations_1): Likewise. > > gcc/testsuite/ > PR tree-optimization/92596 > * gcc.dg/vect/bb-slp-pr92596.c: New test. > * gcc.dg/vect/bb-slp-43.c: Likewise. > > Index: gcc/tree-vect-stmts.c > =================================================================== > --- gcc/tree-vect-stmts.c 2019-11-29 09:13:43.000000000 +0000 > +++ gcc/tree-vect-stmts.c 2019-11-29 09:13:43.768143063 +0000 > @@ -3325,6 +3325,15 @@ vectorizable_call (stmt_vec_info stmt_in > return false; > } > > + if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > + != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "mixed mask and nonmask vector types\n"); > + return false; > + } > + > /* FORNOW */ > nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in); > nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); > @@ -6037,7 +6046,8 @@ vectorizable_operation (stmt_vec_info st > > /* Most operations cannot handle bit-precision types without extra > truncations. */ > - if (!VECTOR_BOOLEAN_TYPE_P (vectype_out) > + bool mask_op_p = VECTOR_BOOLEAN_TYPE_P (vectype_out); > + if (!mask_op_p > && !type_has_mode_precision_p (TREE_TYPE (scalar_dest)) > /* Exception are bitwise binary operations. */ > && code != BIT_IOR_EXPR > @@ -6099,10 +6109,11 @@ vectorizable_operation (stmt_vec_info st > if (maybe_ne (nunits_out, nunits_in)) > return false; > > + tree vectype2 = NULL_TREE, vectype3 = NULL_TREE; > if (op_type == binary_op || op_type == ternary_op) > { > op1 = gimple_assign_rhs2 (stmt); > - if (!vect_is_simple_use (op1, vinfo, &dt[1])) > + if (!vect_is_simple_use (op1, vinfo, &dt[1], &vectype2)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -6113,7 +6124,7 @@ vectorizable_operation (stmt_vec_info st > if (op_type == ternary_op) > { > op2 = gimple_assign_rhs3 (stmt); > - if (!vect_is_simple_use (op2, vinfo, &dt[2])) > + if (!vect_is_simple_use (op2, vinfo, &dt[2], &vectype3)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -6138,6 +6149,21 @@ vectorizable_operation (stmt_vec_info st > > gcc_assert (ncopies >= 1); > > + /* Reject attempts to combine mask types with nonmask types, e.g. if > + we have an AND between a (nonmask) boolean loaded from memory and > + a (mask) boolean result of a comparison. > + > + TODO: We could easily fix these cases up using pattern statements. */ > + if (VECTOR_BOOLEAN_TYPE_P (vectype) != mask_op_p > + || (vectype2 && VECTOR_BOOLEAN_TYPE_P (vectype2) != mask_op_p) > + || (vectype3 && VECTOR_BOOLEAN_TYPE_P (vectype3) != mask_op_p)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "mixed mask and nonmask vector types\n"); > + return false; > + } > + > /* Supportable by target? */ > > vec_mode = TYPE_MODE (vectype); > @@ -12065,9 +12091,6 @@ vect_gen_while_not (gimple_seq *seq, tre > > - Set *STMT_VECTYPE_OUT to: > - NULL_TREE if the statement doesn't need to be vectorized; > - - boolean_type_node if the statement is a boolean operation whose > - vector type can only be determined once all the other vector types > - are known; and > - the equivalent of STMT_VINFO_VECTYPE otherwise. > > - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum > @@ -12124,11 +12147,22 @@ vect_get_vector_types_for_stmt (stmt_vec > tree scalar_type = NULL_TREE; > if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info)) > { > - *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info); > + vectype = STMT_VINFO_VECTYPE (stmt_info); > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "precomputed vectype: %T\n", vectype); > } > + else if (vect_use_mask_type_p (stmt_info)) > + { > + unsigned int precision = stmt_info->mask_precision; > + scalar_type = build_nonstandard_integer_type (precision, 1); > + vectype = get_mask_type_for_scalar_type (vinfo, scalar_type, > group_size); > + if (!vectype) > + return opt_result::failure_at (stmt, "not vectorized: unsupported" > + " data-type %T\n", scalar_type); > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype); > + } > else > { > if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info)) > @@ -12138,28 +12172,6 @@ vect_get_vector_types_for_stmt (stmt_vec > else > scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); > > - /* Pure bool ops don't participate in number-of-units computation. > - For comparisons use the types being compared. */ > - if (!STMT_VINFO_DATA_REF (stmt_info) > - && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) > - && is_gimple_assign (stmt) > - && gimple_assign_rhs_code (stmt) != COND_EXPR) > - { > - *stmt_vectype_out = boolean_type_node; > - > - tree rhs1 = gimple_assign_rhs1 (stmt); > - if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == > tcc_comparison > - && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1))) > - scalar_type = TREE_TYPE (rhs1); > - else > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_NOTE, vect_location, > - "pure bool operation.\n"); > - return opt_result::success (); > - } > - } > - > if (dump_enabled_p ()) > { > if (group_size) > @@ -12177,18 +12189,15 @@ vect_get_vector_types_for_stmt (stmt_vec > " unsupported data-type %T\n", > scalar_type); > > - if (!*stmt_vectype_out) > - *stmt_vectype_out = vectype; > - > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype); > } > + *stmt_vectype_out = vectype; > > /* Don't try to compute scalar types if the stmt produces a boolean > vector; use the existing vector type instead. */ > tree nunits_vectype = vectype; > - if (!VECTOR_BOOLEAN_TYPE_P (vectype) > - && *stmt_vectype_out != boolean_type_node) > + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > { > /* The number of units is set according to the smallest scalar > type (or the largest vector size, but we only support one > @@ -12213,9 +12222,8 @@ vect_get_vector_types_for_stmt (stmt_vec > } > } > > - gcc_assert (*stmt_vectype_out == boolean_type_node > - || multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), > - TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))); > + gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype), > + TYPE_VECTOR_SUBPARTS (*stmt_vectype_out))); > > if (dump_enabled_p ()) > { > @@ -12227,84 +12235,3 @@ vect_get_vector_types_for_stmt (stmt_vec > *nunits_vectype_out = nunits_vectype; > return opt_result::success (); > } > - > -/* Try to determine the correct vector type for STMT_INFO, which is a > - statement that produces a scalar boolean result. Return the vector > - type on success, otherwise return NULL_TREE. If GROUP_SIZE is nonzero > - and we're performing BB vectorization, make sure that the number of > - elements in the vector is no bigger than GROUP_SIZE. */ > - > -opt_tree > -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int > group_size) > -{ > - vec_info *vinfo = stmt_info->vinfo; > - gimple *stmt = stmt_info->stmt; > - tree mask_type = NULL; > - tree vectype, scalar_type; > - > - if (is_gimple_assign (stmt) > - && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison > - && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt)))) > - { > - scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); > - mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, > - group_size); > - > - if (!mask_type) > - return opt_tree::failure_at (stmt, > - "not vectorized: unsupported mask\n"); > - } > - else > - { > - tree rhs; > - ssa_op_iter iter; > - enum vect_def_type dt; > - > - FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE) > - { > - if (!vect_is_simple_use (rhs, stmt_info->vinfo, &dt, &vectype)) > - return opt_tree::failure_at (stmt, > - "not vectorized:can't compute mask" > - " type for statement, %G", stmt); > - > - /* No vectype probably means external definition. > - Allow it in case there is another operand which > - allows to determine mask type. */ > - if (!vectype) > - continue; > - > - if (!mask_type) > - mask_type = vectype; > - else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type), > - TYPE_VECTOR_SUBPARTS (vectype))) > - return opt_tree::failure_at (stmt, > - "not vectorized: different sized > mask" > - " types in statement, %T and %T\n", > - mask_type, vectype); > - else if (VECTOR_BOOLEAN_TYPE_P (mask_type) > - != VECTOR_BOOLEAN_TYPE_P (vectype)) > - return opt_tree::failure_at (stmt, > - "not vectorized: mixed mask and " > - "nonmask vector types in statement, " > - "%T and %T\n", > - mask_type, vectype); > - } > - > - /* We may compare boolean value loaded as vector of integers. > - Fix mask_type in such case. */ > - if (mask_type > - && !VECTOR_BOOLEAN_TYPE_P (mask_type) > - && gimple_code (stmt) == GIMPLE_ASSIGN > - && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == > tcc_comparison) > - mask_type = truth_type_for (mask_type); > - } > - > - /* No mask_type should mean loop invariant predicate. > - This is probably a subject for optimization in if-conversion. */ > - if (!mask_type) > - return opt_tree::failure_at (stmt, > - "not vectorized: can't compute mask type " > - "for statement: %G", stmt); > - > - return opt_tree::success (mask_type); > -} > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2019-11-29 09:13:43.000000000 +0000 > +++ gcc/tree-vect-loop.c 2019-11-29 09:13:43.764143091 +0000 > @@ -163,8 +163,7 @@ static stmt_vec_info vect_is_simple_redu > static opt_result > vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info, > bool vectype_maybe_set_p, > - poly_uint64 *vf, > - vec<stmt_vec_info > *mask_producers) > + poly_uint64 *vf) > { > gimple *stmt = stmt_info->stmt; > > @@ -192,8 +191,6 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i > gcc_assert ((STMT_VINFO_DATA_REF (stmt_info) > || vectype_maybe_set_p) > && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype); > - else if (stmt_vectype == boolean_type_node) > - mask_producers->safe_push (stmt_info); > else > STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype; > } > @@ -206,21 +203,17 @@ vect_determine_vf_for_stmt_1 (stmt_vec_i > > /* Subroutine of vect_determine_vectorization_factor. Set the vector > types of STMT_INFO and all attached pattern statements and update > - the vectorization factor VF accordingly. If some of the statements > - produce a mask result whose vector type can only be calculated later, > - add them to MASK_PRODUCERS. Return true on success or false if > - something prevented vectorization. */ > + the vectorization factor VF accordingly. Return true on success > + or false if something prevented vectorization. */ > > static opt_result > -vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf, > - vec<stmt_vec_info > *mask_producers) > +vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf) > { > vec_info *vinfo = stmt_info->vinfo; > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: %G", > stmt_info->stmt); > - opt_result res > - = vect_determine_vf_for_stmt_1 (stmt_info, false, vf, mask_producers); > + opt_result res = vect_determine_vf_for_stmt_1 (stmt_info, false, vf); > if (!res) > return res; > > @@ -239,10 +232,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf > dump_printf_loc (MSG_NOTE, vect_location, > "==> examining pattern def stmt: %G", > def_stmt_info->stmt); > - if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true, > - vf, mask_producers)) > - res = vect_determine_vf_for_stmt_1 (def_stmt_info, true, > - vf, mask_producers); > + res = vect_determine_vf_for_stmt_1 (def_stmt_info, true, vf); > if (!res) > return res; > } > @@ -251,7 +241,7 @@ vect_determine_vf_for_stmt (stmt_vec_inf > dump_printf_loc (MSG_NOTE, vect_location, > "==> examining pattern statement: %G", > stmt_info->stmt); > - res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf, > mask_producers); > + res = vect_determine_vf_for_stmt_1 (stmt_info, true, vf); > if (!res) > return res; > } > @@ -296,7 +286,6 @@ vect_determine_vectorization_factor (loo > tree vectype; > stmt_vec_info stmt_info; > unsigned i; > - auto_vec<stmt_vec_info> mask_producers; > > DUMP_VECT_SCOPE ("vect_determine_vectorization_factor"); > > @@ -354,8 +343,7 @@ vect_determine_vectorization_factor (loo > { > stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (si)); > opt_result res > - = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor, > - &mask_producers); > + = vect_determine_vf_for_stmt (stmt_info, &vectorization_factor); > if (!res) > return res; > } > @@ -373,16 +361,6 @@ vect_determine_vectorization_factor (loo > return opt_result::failure_at (vect_location, > "not vectorized: unsupported data-type\n"); > LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor; > - > - for (i = 0; i < mask_producers.length (); i++) > - { > - stmt_info = mask_producers[i]; > - opt_tree mask_type = vect_get_mask_type_for_stmt (stmt_info); > - if (!mask_type) > - return opt_result::propagate_failure (mask_type); > - STMT_VINFO_VECTYPE (stmt_info) = mask_type; > - } > - > return opt_result::success (); > } > > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2019-11-29 09:13:43.000000000 +0000 > +++ gcc/tree-vect-slp.c 2019-11-29 09:13:43.764143091 +0000 > @@ -925,17 +925,6 @@ vect_build_slp_tree_1 (unsigned char *sw > || rhs_code == LROTATE_EXPR > || rhs_code == RROTATE_EXPR) > { > - if (vectype == boolean_type_node) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "Build SLP failed: shift of a" > - " boolean.\n"); > - /* Fatal mismatch. */ > - matches[0] = false; > - return false; > - } > - > vec_mode = TYPE_MODE (vectype); > > /* First see if we have a vector/vector shift. */ > @@ -1157,9 +1146,8 @@ vect_build_slp_tree_1 (unsigned char *sw > if (alt_stmt_code != ERROR_MARK > && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) > { > - if (vectype == boolean_type_node > - || !vect_two_operations_perm_ok_p (stmts, group_size, > - vectype, alt_stmt_code)) > + if (!vect_two_operations_perm_ok_p (stmts, group_size, > + vectype, alt_stmt_code)) > { > for (i = 0; i < group_size; ++i) > if (gimple_assign_rhs_code (stmts[i]->stmt) == alt_stmt_code) > @@ -2751,25 +2739,6 @@ vect_slp_analyze_node_operations_1 (vec_ > stmt_vec_info stmt_info = SLP_TREE_SCALAR_STMTS (node)[0]; > gcc_assert (STMT_SLP_TYPE (stmt_info) != loop_vect); > > - /* For BB vectorization vector types are assigned here. > - Memory accesses already got their vector type assigned > - in vect_analyze_data_refs. */ > - bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); > - if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node) > - { > - unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length (); > - tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size); > - if (!vectype) > - /* vect_get_mask_type_for_stmt has already explained the > - failure. */ > - return false; > - > - stmt_vec_info sstmt_info; > - unsigned int i; > - FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, sstmt_info) > - STMT_VINFO_VECTYPE (sstmt_info) = vectype; > - } > - > /* Calculate the number of vector statements to be created for the > scalar stmts in this node. For SLP reductions it is equal to the > number of vector statements in the children (which has already been > Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c > =================================================================== > --- /dev/null 2019-09-17 11:41:18.176664108 +0100 > +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr92596.c 2019-11-29 09:13:43.764143091 > +0000 > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O3" } */ > + > +typedef struct { > + long n[5]; > +} secp256k1_fe; > + > +secp256k1_fe a; > + > +void fn1(int p1) { a.n[0] = a.n[1] = a.n[2] = p1; } > +void fn2() { > + int b; > + fn1(!b); > +} > Index: gcc/testsuite/gcc.dg/vect/bb-slp-43.c > =================================================================== > --- /dev/null 2019-09-17 11:41:18.176664108 +0100 > +++ gcc/testsuite/gcc.dg/vect/bb-slp-43.c 2019-11-29 09:13:43.752143175 > +0000 > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > + > +void > +f (int *restrict x, short *restrict y) > +{ > + x[0] = x[0] == 1 & y[0] == 2; > + x[1] = x[1] == 1 & y[1] == 2; > + x[2] = x[2] == 1 & y[2] == 2; > + x[3] = x[3] == 1 & y[3] == 2; > + x[4] = x[4] == 1 & y[4] == 2; > + x[5] = x[5] == 1 & y[5] == 2; > + x[6] = x[6] == 1 & y[6] == 2; > + x[7] = x[7] == 1 & y[7] == 2; > +} > + > +/* { dg-final { scan-tree-dump-not "mixed mask and nonmask" "slp2" } } */ > +/* { dg-final { scan-tree-dump-not "vector operands from scalars" "slp2" { > target { { vect_int && vect_bool_cmp } && { vect_unpack && vect_hw_misalign } > } xfail vect_variable_length } } } */