https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103905
--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
@Jakub: It looks the problem is in expand_vec_perm_pshufb, where permutation
vector is recalculated for partial vectors:
if (vmode == V4QImode
|| vmode == V8QImode)
{
rtx m128 = GEN_INT (-128);
/* Remap elements from the second operand, as we have to
account for inactive top elements from the first operand. */
if (!d->one_operand_p)
{
int sz = GET_MODE_SIZE (vmode);
for (i = 0; i < nelt; ++i)
{
int ival = INTVAL (rperm[i]);
if (ival >= sz)
ival += 16-sz;
rperm[i] = GEN_INT (ival);
}
}
/* V4QI/V8QI is emulated with V16QI instruction, fill inactive
elements in the top positions with zeros. */
for (i = nelt; i < 16; ++i)
rperm[i] = m128;
vpmode = V16QImode;
}
I must admit I only eyeballed the generated code, so perhaps there lies the
dragon.