https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
So what happens is that we have a vector(16) <signed-boolean:1> constructor

  _151 = {_150, _149, _148, _147, _146, _145, _144, _143, _142, _141, _140,
_139, _138, _137, _136, _135};

fed by a series of

  _150 = _75 ? -1 : 0;

stmts that compute a <signed-boolean:1> from a _Bool.  We're now trying
to vectorize that CTOR (I think that's good).  Now, bool pattern detection
doesn't consider a vector CTOR of <signed-boolean:1> to be a mask precision
"sink" which means we end up with

t.i:26:1: note:   using boolean precision 32 for _49 = _17 != 0;
t.i:26:1: note:   using boolean precision 32 for _74 = _1 != 0;
t.i:26:1: note:   using boolean precision 32 for _75 = _73 & _74;
t.i:26:1: note:   using boolean precision 32 for _70 = _4 != 0;
t.i:26:1: note:   using boolean precision 32 for _71 = _69 & _70;
...

because eventually the compares are 'int' loads.

Now, there's of course the issue that the vectorizer produces this inefficient
code because of similar issues when analyzing the following if-conversion
result
in BB vect mode from the loop vectorizer:

  _16 = MEM[(int *)a_81 + 60B];
  _47 = _16 != 0;
  _45 = _47 & _49;
  iftmp.0_43 = _45 ? _16 : 0;
  MEM[(int *)e_82 + 60B] = iftmp.0_43;

here we end up with the same precisions.  I'm actually unsure how
things should go here, vect_recog_bool_pattern seems to look at
COND_EXPR conditions, but then it does

  else if (rhs_code == COND_EXPR
           && TREE_CODE (var) == SSA_NAME)
    {
      vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs));
      if (vectype == NULL_TREE)
        return NULL;

      /* Build a scalar type for the boolean result that when
         vectorized matches the vector type of the result in
         size and number of elements.  */
      unsigned prec
        = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vectype)),
                               TYPE_VECTOR_SUBPARTS (vectype));

      tree type
        = build_nonstandard_integer_type (prec,
                                          TYPE_UNSIGNED (TREE_TYPE (var)));
      if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE)
        return NULL;

      if (!check_bool_pattern (var, vinfo, bool_stmts))
        return NULL;

going the classic way of using a non-mask type.  For the testcase
in question check_bool_pattern fails though.

But we fail in vectorizable_operation because for a MASK and we run into

  /* Worthwhile without SIMD support?  Check only during analysis.  */
  if (!VECTOR_MODE_P (vec_mode)
      && !vec_stmt
      && !vect_worthwhile_without_simd_p (vinfo, code))
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "not worthwhile without SIMD support.\n");
      return false;
    }

that looks like an inefficiency (only triggering for low tripcount loops).
Also vect_worthwhile_without_simd_p looks at the VF only which is insufficient
for SLP.  Even with that fixed the BB vectorization triggered from loop
vect does not see the invariant compared defs of one arm of the bit-and
so we just create another vector CTOR with <signed-boolean:1> and we
repeat the same mistakes.

Reply via email to