RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

Pengxuan Zheng (QUIC) Wed, 11 Sep 2024 18:01:21 -0700

> Pengxuan Zheng <quic_pzh...@quicinc.com> writes:
> > SVE's INDEX instruction can be used to populate vectors by values
> > starting from "base" and incremented by "step" for each subsequent
> > value. We can take advantage of it to generate vector constants if
> > TARGET_SVE is available and the base and step values are within [-16, 15].
> >
> > For example, with the following function:
> >
> > typedef int v4si __attribute__ ((vector_size (16))); v4si f_v4si
> > (void) {
> >   return (v4si){ 0, 1, 2, 3 };
> > }
> >
> > GCC currently generates:
> >
> > f_v4si:
> >     adrp    x0, .LC4
> >     ldr     q0, [x0, #:lo12:.LC4]
> >     ret
> >
> > .LC4:
> >     .word   0
> >     .word   1
> >     .word   2
> >     .word   3
> >
> > With this patch, we generate an INDEX instruction instead if
> > TARGET_SVE is available.
> >
> > f_v4si:
> >     index   z0.s, #0, #1
> >     ret
> >
> > [...]
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 9e12bd9711c..01bfb8c52e4 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22960,8 +22960,7 @@ aarch64_simd_valid_immediate (rtx op,
> simd_immediate_info *info,
> >    if (CONST_VECTOR_P (op)
> >        && CONST_VECTOR_DUPLICATE_P (op))
> >      n_elts = CONST_VECTOR_NPATTERNS (op);
> > -  else if ((vec_flags & VEC_SVE_DATA)
> > -      && const_vec_series_p (op, &base, &step))
> > +  else if (TARGET_SVE && const_vec_series_p (op, &base, &step))
> 
> I think we need to check which == AARCH64_CHECK_MOV too.  (Previously
> that wasn't necessary, because native SVE only uses this routine for moves.)
> 
> FTR: I was initially a bit nervous about testing TARGET_SVE without looking at
> vec_flags at all.  But looking at the previous handling of predicates and
> structures, I agree it looks like the correct thing to do.
> 
> >      {
> >        gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
> >        if (!aarch64_sve_index_immediate_p (base) [...] diff --git
> > a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > index 216699b0536..3d6a0160f95 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_1.c
> > @@ -10,7 +10,6 @@ dupq (int x)
> >    return svdupq_s32 (x, 1, 2, 3);
> >  }
> >
> > -/* { dg-final { scan-assembler {\tldr\tq[0-9]+,} } } */
> > +/* { dg-final { scan-assembler {\tindex\tz[0-9]+\.s, #1, #2} } } */
> >  /* { dg-final { scan-assembler {\tins\tv[0-9]+\.s\[0\], w0\n} } } */
> >  /* { dg-final { scan-assembler {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n}
> > } } */
> > -/* { dg-final { scan-assembler
> > {\t\.word\t1\n\t\.word\t2\n\t\.word\t3\n} } } */
> 
> This seems to be a regression of sorts.  Previously we had:
> 
>         adrp    x1, .LC0
>         ldr     q0, [x1, #:lo12:.LC0]
>         ins     v0.s[0], w0
>         dup     z0.q, z0.q[0]
> 
> whereas now we have:
> 
>         movi    v0.2s, 0x2
>         index   z31.s, #1, #2
>         ins     v0.s[0], w0
>         zip1    v0.4s, v0.4s, v31.4s
>         dup     z0.q, z0.q[0]
> 
> I think we should try to aim for:
> 
>         index   z0.s, #0, #1
>         ins     v0.s[0], w0
>         dup     z0.q, z0.q[0]
> 
> instead.


Thanks for the feedback, Richard!

I've added support to handle vectors with non-constant elements. I've split 
that change into a separate patch. Please let me know if you have any comments.

[PATCH 1/2] aarch64: Improve vector constant generation using SVE INDEX 
instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662842.html

[PATCH 2/2] aarch64: Improve part-variable vector initialization with SVE INDEX 
instruction [PR113328]
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662843.html

Thanks,
Pengxuan
> 
> > [...]
> > +/*
> > +** g_v4si:
> > +** index   z0\.s, #3, #\-4
> 
> The backslash looks redundant here.
> 
> Thanks,
> Richard
> 
> > +** ret
> > +*/
> > +v4si
> > +g_v4si (void)
> > +{
> > +  return (v4si){ 3, -1, -5, -9 };
> > +}

RE: [PATCH] aarch64: Improve vector constant generation using SVE INDEX instruction [PR113328]

Reply via email to