https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #4)
> Guess we need to extend backend hook to handle different input and output
> modes.
Yes, alternatively as said, some special cases could be directly handled.
For example v16si -> v8si could be handled by VEC_PERM <lowpart, highpart,
{..}>
without any extra magic (but IIRC we don't have a way to query target support
for specific BIT_FIELD_REFs which we'd use for getting at the lowpart
or highpart and if not available those would fall back to memory).
And contiguous permutes could be directly emitted as BIT_FIELD_REFs
(in some cases).
I have a half-way patch that does the preparatory work but leaves
vectorizable_slp_permutation unchanged so we immediately fail there
due to
FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
{
if (!vect_maybe_update_slp_op_vectype (child, vectype)
|| !types_compatible_p (SLP_TREE_VECTYPE (child), vectype))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"Unsupported lane permutation\n");
return false;
the comment above that says
/* ??? We currently only support all same vector input and output types
while the SLP IL should really do a concat + select and thus accept
arbitrary mismatches. */
so it was designed to handle more, it wasn't just necessary to implement it ...