RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

Pengxuan Zheng (QUIC) Wed, 18 Sep 2024 12:48:43 -0700

> > Pengxuan Zheng <[email protected]> writes:
> > > We can still use SVE's INDEX instruction to construct vectors even
> > > if not all elements are constants. For example, { 0, x, 2, 3 } can
> > > be constructed by first using "INDEX #0, #1" to generate { 0, 1, 2,
> > > 3 }, and then set the elements which are non-constants separately.
> > >
> > >   PR target/113328
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
> > >   Improve part-variable vector generation with SVE's INDEX if
> > TARGET_SVE
> > >   is available.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use
> > >   check-function-bodies.
> > >   * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise.
> > >   * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise.
> > >   * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
> > >   * gcc.target/aarch64/sve/vec_init_4.c: New test.
> > >   * gcc.target/aarch64/sve/vec_init_5.c: New test.
> > >
> > > Signed-off-by: Pengxuan Zheng <[email protected]>
> > > ---
> > >  gcc/config/aarch64/aarch64.cc                 | 81 ++++++++++++++++++-
> > >  .../aarch64/sve/acle/general/dupq_1.c         | 18 ++++-
> > >  .../aarch64/sve/acle/general/dupq_2.c         | 18 ++++-
> > >  .../aarch64/sve/acle/general/dupq_3.c         | 18 ++++-
> > >  .../aarch64/sve/acle/general/dupq_4.c         | 18 ++++-
> > >  .../gcc.target/aarch64/sve/vec_init_4.c       | 47 +++++++++++
> > >  .../gcc.target/aarch64/sve/vec_init_5.c       | 12 +++
> > >  7 files changed, 199 insertions(+), 13 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c
> > >  create mode 100644
> > > gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> > > b/gcc/config/aarch64/aarch64.cc index 6b3ca57d0eb..7305a5c6375
> > > 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx
> > target, rtx vals)
> > >    if (n_var != n_elts)
> > >      {
> > >        rtx copy = copy_rtx (vals);
> > > +      bool is_index_seq = false;
> > > +
> > > +      /* If at least half of the elements of the vector are constants 
> > > and all
> > > +  these constant elements form a linear sequence of the form { B, B
> > > ++
> > S,
> > > +  B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's
> > > +  INDEX instruction if SVE is available and then set the elements which
> > > +  are not constant separately.  More precisely, each constant element I
> > > +  has to be B + I * S where B and S must be valid immediate operand
> > for
> > > +  an SVE INDEX instruction.
> > > +
> > > +  For example, { X, 1, 2, 3} is a vector satisfying these conditions and
> > > +  we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first
> > > +  and then set the first element of the vector to X.  */
> > > +
> > > +      if (TARGET_SVE && GET_MODE_CLASS (mode) ==
> MODE_VECTOR_INT
> > > +   && n_var <= n_elts / 2)
> > > + {
> > > +   int const_idx = -1;
> > > +   HOST_WIDE_INT const_val = 0;
> > > +   int base = 16;
> > > +   int step = 16;
> > > +
> > > +   for (int i = 0; i < n_elts; ++i)
> > > +     {
> > > +       rtx x = XVECEXP (vals, 0, i);
> > > +
> > > +       if (!CONST_INT_P (x))
> > > +         continue;
> > > +
> > > +       if (const_idx == -1)
> > > +         {
> > > +           const_idx = i;
> > > +           const_val = INTVAL (x);
> > > +         }
> > > +       else
> > > +         {
> > > +           if ((INTVAL (x) - const_val) % (i - const_idx) == 0)
> > > +             {
> > > +               HOST_WIDE_INT s
> > > +                   = (INTVAL (x) - const_val) / (i - const_idx);
> > > +               if (s >= -16 && s <= 15)
> > > +                 {
> > > +                   int b = const_val - s * const_idx;
> > > +                   if (b >= -16 && b <= 15)
> > > +                     {
> > > +                       base = b;
> > > +                       step = s;
> > > +                     }
> > > +                 }
> > > +             }
> > > +           break;
> > > +         }
> > > +     }
> > > +
> > > +   if (base != 16
> > > +       && (!CONST_INT_P (v0)
> > > +           || (CONST_INT_P (v0) && INTVAL (v0) == base)))
> > > +     {
> > > +       if (!CONST_INT_P (v0))
> > > +         XVECEXP (copy, 0, 0) = GEN_INT (base);
> > > +
> > > +       is_index_seq = true;
> > > +       for (int i = 1; i < n_elts; ++i)
> > > +         {
> > > +           rtx x = XVECEXP (copy, 0, i);
> > > +
> > > +           if (CONST_INT_P (x))
> > > +             {
> > > +               if (INTVAL (x) != base + i * step)
> > > +                 {
> > > +                   is_index_seq = false;
> > > +                   break;
> > > +                 }
> > > +             }
> > > +           else
> > > +             XVECEXP (copy, 0, i) = GEN_INT (base + i * step);
> > > +         }
> > > +     }
> > > + }
> >
> > This seems a bit more complex than I was hoping for, although the
> > complexity is probably justified.
> >
> > Seeing how awkard it is to do this using current interfaces, I think
> > I'd instead prefer to do something that I'd been vaguely hoping to do
> > for a while: extend vector-builder.h to accept wildcard/don't care values.
> > finalize () could then replace the wildcards with whatever gives the 
> > "nicest"
> > encoding.
> >
> > That's also going to be relatively complex, but I think it'd be more
> > general, and might help with the existing vec_init code as well.
> > It would also be a step towards optimising -1 indices for
> > __builtin_shufflevector.  It might be a few weeks before I can post
> > something though.
> 
> No problem, Richard.
> 
> I am also curious to see what this alternative implementation looks like.
> Please kindly keep me posted when your patch is ready. Thank you!
> 
> >
> > Pushing 1/2 without 2/2 has meant that the dupq tests will fail in the
> > meantime, but that's ok.  In general, though, it's better not to push
> > individual patches from a series unless they've been tested in
> > isolation and are known to give clean test results.
> 
> In fact, the dupq tests were not affected. Patch 1/2 already adjusted the
> "scan-assembler" checks of the dupq tests based on the output of 1/2 alone.
> Patch 2/2 just replaces the "scan-assembler" checks with "check-function-
> bodies." So, the dupq tests still pass without 2/2.


Just realized that I got confused on what 1/2 does. You are right. The dupq 
tests will fail for now.

Again, sorry for the confusions caused. 😊

Thanks,
Pengxuan
> 
> Thanks,
> Pengxuan
> >
> > Thanks,
> > Richard

RE: [PATCH v2 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX instruction [PR113328]

Reply via email to