On 09/12/2019 15:59, Richard Sandiford wrote:
No, the assumption's correct even there. The assert usually triggers
because something elsewhere is getting confused about the vector types.
The attached patch fixes the ICE in the testcase, but I suspect does not
go far enough. Can it happen that NUNITS can be greater than the
vectorization factor, but not a multiple? Is this even a valid fix in
the first place? Must it be conditionalized on masking being available?
Is the exactness even worth checking, in the presence of exceptions?
The vector types and VF aren't chosen based on whether masking is available.
It happens the other way around: we first analyse the loop and pick the VF
for an unmasked loop, but record as we go whether a masked implementation
is also possible. Then we decide at the end whether to use a masked
implementation instead of an unmasked one.
So if this assert triggers for masked loops, it could trigger for unmasked
loops too.
OK, I completely misunderstood what was happening here.
What happens is that it goes through and finds vector types for every
statement, and then says "Updating vectorization factor to 4.", but
doesn't then go back and check for suitable types.
So, then it gets to this:
if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
{
poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
? vf * DR_GROUP_SIZE (stmt_info) : vf);
possible_npeel_number
= vect_get_num_vectors (nscalars, vectype);
/* NPEEL_TMP is 0 when there is no misalignment, but also
allow peeling NELEMENTS. */
if (DR_MISALIGNMENT (dr_info) == 0)
possible_npeel_number++;
}
where "vf" is now 4, the group size appears to be 2, and "vectype" is
V64SI, and vect_get_num_vectors blows up trying to divide 8 by 64.
If I switch back to the default cost model then the "vect" pass
completes successfully, although vectorization fails due to a missing
vector operator. The following "slp2" pass then switches to TImode fake
vectors and works fine.
Alternatively, if I add back my patch then the pass completes the same
way (without vectorization).
Any suggestions? I can't see how this stuff is supposed to work?
Andrew