[Bug target/101579] Suboptimal codegen for __builtin_shufflevector

crazylht at gmail dot com via Gcc-bugs Wed, 28 Jul 2021 03:03:12 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101579


--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> (In reply to Hongtao.liu from comment #4)
> > I looks to me that middle end should be able to transform 64-byte vector
> > shuffle to 32-byte vector shuffle when data flow analysis shows the upper
> > part of the vector is never used.
> 
> That is not needed in this case.  The permutation is such that all indices
> for the first half read from one half (in this case the first) and all
> indices for the second half read from one half (in this case the first
> again), so it is
> identical to a vector containing permutation of the first half with the
> first half of the indices and permutation of the first half again with the
> second half of the indices.

  U u = ((union { V a; U b; }) w).b + ((union { V a; U b; }) w).b[1];
  return u;

I means the result only use the first half, we can just create a tmp v1 with
v1 = __builtin_shufflevector (g, g,
                                 0, 1, 2, 0, 5, 1, 0, 1, 3, 2, 3, 0, 4, 3, 1,
2,
                                 2, 0, 4, 2, 3, 1, 1, 2, 3, 4, 1, 1, 0, 0, 5,
2)(In reply to Jakub Jelinek from comment #6)
> (In reply to Hongtao.liu from comment #4)
> > I looks to me that middle end should be able to transform 64-byte vector
> > shuffle to 32-byte vector shuffle when data flow analysis shows the upper
> > part of the vector is never used.
> 
> That is not needed in this case.  The permutation is such that all indices
> for the first half read from one half (in this case the first) and all
> indices for the second half read from one half (in this case the first
> again), so it is
> identical to a vector containing permutation of the first half with the
> first half of the indices and permutation of the first half again with the
> second half of the indices.

  U u = ((union { V a; U b; }) w).b + ((union { V a; U b; }) w).b[1];
  return u;

I means the result u only cared about the first half, we can drop the second
half, it's redundant.

[Bug target/101579] Suboptimal codegen for __builtin_shufflevector

Reply via email to