Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> We can still use SVE's INDEX instruction to construct vectors even if not all
> elements are constants. For example, { 0, x, 2, 3 } can be constructed by 
> first
> using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements 
> which
> are non-constants separately.
>
>       PR target/113328
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
>       Improve part-variable vector generation with SVE's INDEX if TARGET_SVE
>       is available.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
>       check-function-bodies.
>       * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
>       * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
>       * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
>       * gcc.target/aarch64/sve/vec_init_4.c: New test.
>       * gcc.target/aarch64/sve/vec_init_5.c: New test.
>
> Signed-off-by: Pengxuan Zheng <quic_pzh...@quicinc.com>
> ---
>  gcc/config/aarch64/aarch64.cc                 | 81 ++++++++++++++++++-
>  .../aarch64/sve/acle/general/dupq_1.c         | 18 ++++-
>  .../aarch64/sve/acle/general/dupq_2.c         | 18 ++++-
>  .../aarch64/sve/acle/general/dupq_3.c         | 18 ++++-
>  .../aarch64/sve/acle/general/dupq_4.c         | 18 ++++-
>  .../gcc.target/aarch64/sve/vec_init_4.c       | 47 +++++++++++
>  .../gcc.target/aarch64/sve/vec_init_5.c       | 12 +++
>  7 files changed, 199 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 6b3ca57d0eb..7305a5c6375 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, 
> rtx vals)
>    if (n_var != n_elts)
>      {
>        rtx copy = copy_rtx (vals);
> +      bool is_index_seq = false;
> +
> +      /* If at least half of the elements of the vector are constants and all
> +      these constant elements form a linear sequence of the form { B, B + S,
> +      B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
> +      INDEX instruction if SVE is available and then set the elements which
> +      are not constant separately.  More precisely, each constant element I
> +      has to be B + I * S where B and S must be valid immediate operand for
> +      an SVE INDEX instruction.
> +
> +      For example, { X, 1, 2, 3} is a vector satisfying these conditions and
> +      we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
> +      and then set the first element of the vector to X.  */
> +
> +      if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +       && n_var <= n_elts / 2)
> +     {
> +       int const_idx = -1;
> +       HOST_WIDE_INT const_val = 0;
> +       int base = 16;
> +       int step = 16;
> +
> +       for (int i = 0; i < n_elts; ++i)
> +         {
> +           rtx x = XVECEXP (vals, 0, i);
> +
> +           if (!CONST_INT_P (x))
> +             continue;
> +
> +           if (const_idx == -1)
> +             {
> +               const_idx = i;
> +               const_val = INTVAL (x);
> +             }
> +           else
> +             {
> +               if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
> +                 {
> +                   HOST_WIDE_INT s
> +                       = (INTVAL (x) - const_val) / (i - const_idx);
> +                   if (s >= -16 && s <= 15)
> +                     {
> +                       int b = const_val - s * const_idx;
> +                       if (b >= -16 && b <= 15)
> +                         {
> +                           base = b;
> +                           step = s;
> +                         }
> +                     }
> +                 }
> +               break;
> +             }
> +         }
> +
> +       if (base != 16
> +           && (!CONST_INT_P (v0)
> +               || (CONST_INT_P (v0) && INTVAL (v0) == base)))
> +         {
> +           if (!CONST_INT_P (v0))
> +             XVECEXP (copy, 0, 0) = GEN_INT (base);
> +
> +           is_index_seq = true;
> +           for (int i = 1; i < n_elts; ++i)
> +             {
> +               rtx x = XVECEXP (copy, 0, i);
> +
> +               if (CONST_INT_P (x))
> +                 {
> +                   if (INTVAL (x) != base + i * step)
> +                     {
> +                       is_index_seq = false;
> +                       break;
> +                     }
> +                 }
> +               else
> +                 XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
> +             }
> +         }
> +     }

This seems a bit more complex than I was hoping for, although the
complexity is probably justified.

Seeing how awkard it is to do this using current interfaces, I think
I'd instead prefer to do something that I'd been vaguely hoping to do
for a while: extend vector-builder.h to accept wildcard/don't care values.
finalize () could then replace the wildcards with whatever gives the
"nicest" encoding.

That's also going to be relatively complex, but I think it'd be more
general, and might help with the existing vec_init code as well.
It would also be a step towards optimising -1 indices for
__builtin_shufflevector.  It might be a few weeks before I can post
something though.

Pushing 1/2 without 2/2 has meant that the dupq tests will fail in the
meantime, but that's ok.  In general, though, it's better not to push
individual patches from a series unless they've been tested in isolation
and are known to give clean test results.

Thanks,
Richard

Reply via email to