https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
For GCN the issue is that with vector(64) unsigned short we fail the permute
(but we have { target vect64 } for this reason), but we then re-try with
the same mode but with SLP disabled and that succeeds.

The best strathegy for GCN would be to gather V4QImode aka SImode into the
V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
doing consecutive loads isn't a good strategy here.

On x86 we can use a small vector and use half of it (gathers would be slow).

On sparc we start with V8QImode which is great but then sparc doesn't seem
able to build a V8QImode vector from two V4QImode vectors or have
V2SImode and build from two SImode values (and load SImode from pix1/pix2,
that possibly due to alignment).  I do see a vec_initv2sisi though.  Ah,
so we verify we can do the load using a permutation, permute two V8QImode
'a' and 'b' to get you a { a_low, b_low } V8QImode vector.  The other
part is eliding of the gap that will end up loading half of the vector
but then pad it out as { a_low, 0 } but then still invoke this unsupported
permutation to get { a_low, b_low }.  So in this case requiring vect_perm
would fix this though there is sparc_vectorize_vec_perm_const and vec_perm<>
guarded with VIS2, with -mvis2 we get past this failure point and run into

missed:   not vectorized: relevant stmt not supported: _35 = (unsigned short)
_34;

So there's no vec_upack_{hi,lo}_v4hi.  vect_unpack guards this.

Maybe I should move the test to be x86 specific.

I'll add the two dg-effective targets to fix the solaris fallout for now.

Reply via email to