https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- So what happens is that we have a vector(16) <signed-boolean:1> constructor _151 = {_150, _149, _148, _147, _146, _145, _144, _143, _142, _141, _140, _139, _138, _137, _136, _135}; fed by a series of _150 = _75 ? -1 : 0; stmts that compute a <signed-boolean:1> from a _Bool. We're now trying to vectorize that CTOR (I think that's good). Now, bool pattern detection doesn't consider a vector CTOR of <signed-boolean:1> to be a mask precision "sink" which means we end up with t.i:26:1: note: using boolean precision 32 for _49 = _17 != 0; t.i:26:1: note: using boolean precision 32 for _74 = _1 != 0; t.i:26:1: note: using boolean precision 32 for _75 = _73 & _74; t.i:26:1: note: using boolean precision 32 for _70 = _4 != 0; t.i:26:1: note: using boolean precision 32 for _71 = _69 & _70; ... because eventually the compares are 'int' loads. Now, there's of course the issue that the vectorizer produces this inefficient code because of similar issues when analyzing the following if-conversion result in BB vect mode from the loop vectorizer: _16 = MEM[(int *)a_81 + 60B]; _47 = _16 != 0; _45 = _47 & _49; iftmp.0_43 = _45 ? _16 : 0; MEM[(int *)e_82 + 60B] = iftmp.0_43; here we end up with the same precisions. I'm actually unsure how things should go here, vect_recog_bool_pattern seems to look at COND_EXPR conditions, but then it does else if (rhs_code == COND_EXPR && TREE_CODE (var) == SSA_NAME) { vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs)); if (vectype == NULL_TREE) return NULL; /* Build a scalar type for the boolean result that when vectorized matches the vector type of the result in size and number of elements. */ unsigned prec = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vectype)), TYPE_VECTOR_SUBPARTS (vectype)); tree type = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (TREE_TYPE (var))); if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE) return NULL; if (!check_bool_pattern (var, vinfo, bool_stmts)) return NULL; going the classic way of using a non-mask type. For the testcase in question check_bool_pattern fails though. But we fail in vectorizable_operation because for a MASK and we run into /* Worthwhile without SIMD support? Check only during analysis. */ if (!VECTOR_MODE_P (vec_mode) && !vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "not worthwhile without SIMD support.\n"); return false; } that looks like an inefficiency (only triggering for low tripcount loops). Also vect_worthwhile_without_simd_p looks at the VF only which is insufficient for SLP. Even with that fixed the BB vectorization triggered from loop vect does not see the invariant compared defs of one arm of the bit-and so we just create another vector CTOR with <signed-boolean:1> and we repeat the same mistakes.