Jennifer Schmitz <jschm...@nvidia.com> writes: > If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a > ptrue predicate can be replaced by neon instructions (LDR and STR), > thus avoiding the predicate altogether. This also enables formation of > LDP/STP pairs. > > For example, the test cases > > svfloat64_t > ptrue_load (float64_t *x) > { > svbool_t pg = svptrue_b64 (); > return svld1_f64 (pg, x); > } > void > ptrue_store (float64_t *x, svfloat64_t data) > { > svbool_t pg = svptrue_b64 (); > return svst1_f64 (pg, x, data); > } > > were previously compiled to > (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128): > > ptrue_load: > ptrue p3.b, vl16 > ld1d z0.d, p3/z, [x0] > ret > ptrue_store: > ptrue p3.b, vl16 > st1d z0.d, p3, [x0] > ret > > Now the are compiled to: > > ptrue_load: > ldr q0, [x0] > ret > ptrue_store: > str q0, [x0] > ret > > The implementation includes the if-statement > if (known_eq (GET_MODE_SIZE (mode), 16) > && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA) > which checks for 128-bit VLS and excludes partial modes with a > mode size < 128 (e.g. VNx2QI). > > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> > > gcc/ > * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move): > Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS. > > gcc/testsuite/ > * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test. > * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome. > * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise. > * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise. > * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise. > --- > gcc/config/aarch64/aarch64.cc | 29 ++++++++-- > .../gcc.target/aarch64/sve/cond_arith_6.c | 3 +- > .../aarch64/sve/ldst_ptrue_128_to_neon.c | 48 ++++++++++++++++ > .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 +++++-------- > .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 +++++-------- > .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 56 +++++++------------ > 6 files changed, 118 insertions(+), 96 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c
OK, thanks. Richard