https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For bar, the problem is that while vpmovdw is AVX512F, we actually recognize it
only at combine time as vpermw (with selected exact permutation) combined with
low part extraction.  And vpermw is only AVX512BW.
In order to optimize it, we'd need to implement what LLVM actually has support
for, namely the "I don't care" possibilities for the permutations.
So, instead of what we emit right now in GIMPLE:
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31 }>;
  _3 = BIT_FIELD_REF <_1, 256, 0>;
we'd need to emit
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY,
ANY, ANY, ANY, ANY }>;
(we'd need a special VEC_PERM_EXPR variant for that which would only accept
VECTOR_CSTs and reserve all ones for the "ANY" case in there).
And, the hard part, adjust the target const vec perm code to handle those
efficiently - as a wildcard for whatever other element of the vector or
constant 0.  One thing are the code which verifies the d->perm[?] values which
would treat the wildcards as anything but for a successful match we'd actually
need to compute what value is best based on the non-wildcard values in the
permutation.
Another are the many cases where we construct RTL and try to recog it, we'd
need
some new RTL which would stand for CONST_INT_WILDCARD that would compare equal
to any int, but would need some way how the pattern if matched would actually
tells us back which number it wants to use.

With that support, we could recognize the { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18,
20, 22, 24, 26, 28, 30, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY, ANY,
ANY, ANY, ANY, ANY, ANY } V32HI permutation as matching the vpmovdw instruction
which puts 0s in the upper half of the vector.

The foo case is doable even without this I think, the question is whether we
should try to split arbitrary permutation of 64-byte vectors into permutations
of the two halves merged then together if the permutation allows that (first
half of elements is from first halves of the inputs and second half of elements
is from second halves of the inputs).

Reply via email to