Pengxuan Zheng <quic_pzh...@quicinc.com> writes: > We can still use SVE's INDEX instruction to construct vectors even if not all > elements are constants. For example, { 0, x, 2, 3 } can be constructed by > first > using "INDEX #0, #1" to generate { 0, 1, 2, 3 }, and then set the elements > which > are non-constants separately. > > PR target/113328 > > gcc/ChangeLog: > > * config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback): > Improve part-variable vector generation with SVE's INDEX if TARGET_SVE > is available. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/general/dupq_1.c: Update test to use > check-function-bodies. > * gcc.target/aarch64/sve/acle/general/dupq_2.c: Likewise. > * gcc.target/aarch64/sve/acle/general/dupq_3.c: Likewise. > * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise. > * gcc.target/aarch64/sve/vec_init_4.c: New test. > * gcc.target/aarch64/sve/vec_init_5.c: New test. > > Signed-off-by: Pengxuan Zheng <quic_pzh...@quicinc.com> > --- > gcc/config/aarch64/aarch64.cc | 81 ++++++++++++++++++- > .../aarch64/sve/acle/general/dupq_1.c | 18 ++++- > .../aarch64/sve/acle/general/dupq_2.c | 18 ++++- > .../aarch64/sve/acle/general/dupq_3.c | 18 ++++- > .../aarch64/sve/acle/general/dupq_4.c | 18 ++++- > .../gcc.target/aarch64/sve/vec_init_4.c | 47 +++++++++++ > .../gcc.target/aarch64/sve/vec_init_5.c | 12 +++ > 7 files changed, 199 insertions(+), 13 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_4.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/vec_init_5.c > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 6b3ca57d0eb..7305a5c6375 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -23942,12 +23942,91 @@ aarch64_expand_vector_init_fallback (rtx target, > rtx vals) > if (n_var != n_elts) > { > rtx copy = copy_rtx (vals); > + bool is_index_seq = false; > + > + /* If at least half of the elements of the vector are constants and all > + these constant elements form a linear sequence of the form { B, B + S, > + B + 2 * S, B + 3 * S, ... }, we can generate the vector with SVE's > + INDEX instruction if SVE is available and then set the elements which > + are not constant separately. More precisely, each constant element I > + has to be B + I * S where B and S must be valid immediate operand for > + an SVE INDEX instruction. > + > + For example, { X, 1, 2, 3} is a vector satisfying these conditions and > + we can generate a vector of all constants (i.e., { 0, 1, 2, 3 }) first > + and then set the first element of the vector to X. */ > + > + if (TARGET_SVE && GET_MODE_CLASS (mode) == MODE_VECTOR_INT > + && n_var <= n_elts / 2) > + { > + int const_idx = -1; > + HOST_WIDE_INT const_val = 0; > + int base = 16; > + int step = 16; > + > + for (int i = 0; i < n_elts; ++i) > + { > + rtx x = XVECEXP (vals, 0, i); > + > + if (!CONST_INT_P (x)) > + continue; > + > + if (const_idx == -1) > + { > + const_idx = i; > + const_val = INTVAL (x); > + } > + else > + { > + if ((INTVAL (x) - const_val) % (i - const_idx) == 0) > + { > + HOST_WIDE_INT s > + = (INTVAL (x) - const_val) / (i - const_idx); > + if (s >= -16 && s <= 15) > + { > + int b = const_val - s * const_idx; > + if (b >= -16 && b <= 15) > + { > + base = b; > + step = s; > + } > + } > + } > + break; > + } > + } > + > + if (base != 16 > + && (!CONST_INT_P (v0) > + || (CONST_INT_P (v0) && INTVAL (v0) == base))) > + { > + if (!CONST_INT_P (v0)) > + XVECEXP (copy, 0, 0) = GEN_INT (base); > + > + is_index_seq = true; > + for (int i = 1; i < n_elts; ++i) > + { > + rtx x = XVECEXP (copy, 0, i); > + > + if (CONST_INT_P (x)) > + { > + if (INTVAL (x) != base + i * step) > + { > + is_index_seq = false; > + break; > + } > + } > + else > + XVECEXP (copy, 0, i) = GEN_INT (base + i * step); > + } > + } > + }
This seems a bit more complex than I was hoping for, although the complexity is probably justified. Seeing how awkard it is to do this using current interfaces, I think I'd instead prefer to do something that I'd been vaguely hoping to do for a while: extend vector-builder.h to accept wildcard/don't care values. finalize () could then replace the wildcards with whatever gives the "nicest" encoding. That's also going to be relatively complex, but I think it'd be more general, and might help with the existing vec_init code as well. It would also be a step towards optimising -1 indices for __builtin_shufflevector. It might be a few weeks before I can post something though. Pushing 1/2 without 2/2 has meant that the dupq tests will fail in the meantime, but that's ok. In general, though, it's better not to push individual patches from a series unless they've been tested in isolation and are known to give clean test results. Thanks, Richard