Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Nov 20, 2017 at 12:56 AM, Jeff Law <l...@redhat.com> wrote:
>> On 11/09/2017 06:24 AM, Richard Sandiford wrote:
>>> ...so that we can use them for variable-length vectors.  For now
>>> constant-length vectors continue to use VEC_PERM_EXPR and the
>>> vec_perm_const optab even for cases that the new optabs could
>>> handle.
>>>
>>> The vector optabs are inconsistent about whether there should be
>>> an underscore before the mode part of the name, but the other lo/hi
>>> optabs have one.
>>>
>>> Doing this means that we're able to optimise some SLP tests using
>>> non-SLP (for now) on targets with variable-length vectors, so the
>>> patch needs to add a few XFAILs.  Most of these go away with later
>>> patches.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linus-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> 2017-11-09  Richard Sandiford  <richard.sandif...@linaro.org>
>>>           Alan Hayward  <alan.hayw...@arm.com>
>>>           David Sherwood  <david.sherw...@arm.com>
>>>
>>> gcc/
>>>       * doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
>>>       (vec_extract_even, vec_extract_odd): Document new optabs.
>>>       * internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
>>>       (VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
>>>       functions.
>>>       * optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
>>>       (vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
>>>       New optabs.
>>>       * tree-vect-data-refs.c: Include internal-fn.h.
>>>       (vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}.
>>>       (vect_permute_store_chain): Use them here too.
>>>       (vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
>>>       (vect_permute_load_chain): Use them here too.
>>>       * tree-vect-stmts.c (can_reverse_vector_p): New function.
>>>       (get_negative_load_store_type): Use it.
>>>       (reverse_vector): New function.
>>>       (vectorizable_store, vectorizable_load): Use it.
>>>       * config/aarch64/iterators.md (perm_optab): New iterator.
>>>       * config/aarch64/aarch64-sve.md (<perm_optab>_<mode>): New expander.
>>>       (vec_reverse_<mode>): Likewise.
>>>
>>> gcc/testsuite/
>>>       * gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
>>>       * gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
>>>       * gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
>>>       * gcc.dg/vect/pr68445.c: Likewise.
>>>       * gcc.dg/vect/slp-12a.c: Likewise.
>>>       * gcc.dg/vect/slp-13-big-array.c: Likewise.
>>>       * gcc.dg/vect/slp-13.c: Likewise.
>>>       * gcc.dg/vect/slp-14.c: Likewise.
>>>       * gcc.dg/vect/slp-15.c: Likewise.
>>>       * gcc.dg/vect/slp-42.c: Likewise.
>>>       * gcc.dg/vect/slp-multitypes-2.c: Likewise.
>>>       * gcc.dg/vect/slp-multitypes-4.c: Likewise.
>>>       * gcc.dg/vect/slp-multitypes-5.c: Likewise.
>>>       * gcc.dg/vect/slp-reduc-4.c: Likewise.
>>>       * gcc.dg/vect/slp-reduc-7.c: Likewise.
>>>       * gcc.target/aarch64/sve_vec_perm_2.c: New test.
>>>       * gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
>>>       * gcc.target/aarch64/sve_vec_perm_3.c: New test.
>>>       * gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
>>>       * gcc.target/aarch64/sve_vec_perm_4.c: New test.
>>>       * gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.
>> OK.
>
> It's really a step backwards - we had those optabs and a tree code in
> the past and
> canonicalizing things to VEC_PERM_EXPR made things simpler.
>
> Why doesn't VEC_PERM <v1, v2, that-constant-series-expr-thing> not work?

The problems with that are:

- It doesn't work for vectors with 256-bit elements because the indices
  wrap round.

- Supporting a fake VEC_PERM_EXPR <v256qi, v256qi, v256hi> for a few
  special cases would be hard, especially since v256hi isn't a normal
  vector mode.  I imagine everything dealing with VEC_PERM_EXPR would
  then have to worry about that special case.

- VEC_SERIES_CST only copes naturally with EXTRACT_EVEN, EXTRACT_ODD
  and REVERSE.  INTERLEAVE_LO is { 0, N/2, 1, N/2+1, ... }.
  I guess it's possible to represent that using a combination of
  shifts, masks, and additions, but then:

  1) when generating them, we'd need to make sure that we cost the
     operation as a single permute, rather than costing all the shifts,
     masks and additions

  2) we'd need to make sure that all gimple optimisations that run
     afterwards don't perturb the sequence, otherwise we'll end up
     with something that's very expensive.

  3) that sequence wouldn't be handled by existing VEC_PERM_EXPR
     optimisations, and it wouldn't be trivial to add it, since we'd
     need to re-recognise the sequence first.

  4) expand would need to re-recognise the sequence and use the
     optab anyway.

  Using an internal function seems much simpler :-)

I think VEC_PERM_EXPR is useful because it represents the same
operation as __builtin_shuffle, and we want to optimise that as best
we can.  But these internal functions are only used by the vectoriser,
which should always see what the final form of the permute should be.

Thanks,
Richard

Reply via email to