On Tue, 5 Dec 2023, Robin Dapp wrote:
> > But how do we know BI<N>mode fits in QImode?
>
> I was kind of hoping that a "bit" always fits in a "byte"/unit
> but yeah, I guess we don't always know :/
But the "bit" is of constant size, so we could choose a fitting mode?
> > I think the issue is more that we try to extract an element from
> > the mask vector? How is element extraction defined for VLA vectors
> > anyway? How can we be sure to not access out-of-bounds?
>
> The mask extraction I also found odd the last time we hit this. But
> on aarch64 the same pattern is generated (although not via the
> vec_extract path) therefore I assumed that it's not fundamentally
> the wrong way.
>
> For the case here we only extract the last element of the vector
> (nunits - 1) so out of bounds is not an issue. Regarding
> out of bounds in general I was hoping we only extract when we know
> that this is ok (e.g. the first or the last element).
>
> So supposing a mask extraction is generally ok, my main issue is
> that expmed tries a BImode extract and I'm not sure this can ever
> work? Can we even move into a BImode apart from comparison results?
Well, the question is what can the hardware do?
> I can circumvent the BImode target by going the vectorizer route and
> adding:
>
> /* Wrong check obviously. */
> else if (can_vec_extract_var_idx_p (TYPE_MODE (vectype),
> TYPE_MODE (TREE_TYPE (vectype))))
> {
> tree n1 = bitsize_int (nunits - 1);
> tree scalar_res
> = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
> vec_lhs_phi, n1);
>
> /* Convert the extracted vector element to the scalar type. */
> new_tree = gimple_convert (&stmts, lhs_type, scalar_res);
> }
>
> to vectorizable_live_operation.
>
> (similar to the length and mask way). As long as we can handle
> a poly_int in the extract that works as well and extracts a QImode.
But why does RTL expansion not use vec_extract? Because of that
BImode oddity?
So yes, I guess we need to answer BImode vs. QImode. I hope Richard
has a better idea here?
Thanks,
Richard.