https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #12)
> (In reply to Andrew Stubbs from comment #10)
> > GFX10 has more limited permutation capabilities than GFX9 because it 
> > only has 32-lane vectors natively, even though we're using the 64-lane 
> > "compatibility" mode.
> > 
> > However, in theory, the permutation capabilities on V32 and below should 
> > be the same, and some permutations on V64 are allowed, so I don't know 
> > why it doesn't use it. It's possible I broke the logic in 
> > gcn_vectorize_vec_perm_const:
> > 
> >    /* RDNA devices can only do permutations within each group of 32-lanes.
> >       Reject permutations that cross the boundary.  */
> >    if (TARGET_RDNA2_PLUS)
> >      for (unsigned int i = 0; i < nelt; i++)
> >        if (i < 31 ? perm[i] > 31 : perm[i] < 32)
> >          return false;
> > 
> > It looks right to me though?
> 
> nelt == 32 so I think the last element has the wrong check applied?
> 
> It should be
> 
> >        if (i < 32 ? perm[i] > 31 : perm[i] < 32)
> 
> I think.  With that the vectorization happens in a similar way but the
> failure still doesn't reproduce (without the patch, of course).

Btw, the above looks quite odd for nelt == 32 anyway - we are permuting
two vectors src0 and src1 into one 32 element dst vector (it's no longer
required that src0 and src1 line up with the dst vector size btw, they
might have different nelt).  So the loop would reject interleaving
the low parts of two 32 element vectors, a permute that would look like
{ 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
mean you can never mix the two vector inputs?  Or does GCN not have
a two-to-one vector permute instruction?

Reply via email to