Jennifer Schmitz <jschm...@nvidia.com> writes:
> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a
> ptrue predicate can be replaced by neon instructions (LDR and STR),
> thus avoiding the predicate altogether. This also enables formation of
> LDP/STP pairs.
>
> For example, the test cases
>
> svfloat64_t
> ptrue_load (float64_t *x)
> {
>   svbool_t pg = svptrue_b64 ();
>   return svld1_f64 (pg, x);
> }
> void
> ptrue_store (float64_t *x, svfloat64_t data)
> {
>   svbool_t pg = svptrue_b64 ();
>   return svst1_f64 (pg, x, data);
> }
>
> were previously compiled to
> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128):
>
> ptrue_load:
>         ptrue   p3.b, vl16
>         ld1d    z0.d, p3/z, [x0]
>         ret
> ptrue_store:
>         ptrue   p3.b, vl16
>         st1d    z0.d, p3, [x0]
>         ret
>
> Now the are compiled to:
>
> ptrue_load:
>         ldr     q0, [x0]
>         ret
> ptrue_store:
>         str     q0, [x0]
>         ret
>
> The implementation includes the if-statement
> if (known_eq (GET_MODE_SIZE (mode), 16)
>     && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA)
> which checks for 128-bit VLS and excludes partial modes with a
> mode size < 128 (e.g. VNx2QI).
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
>
> gcc/
>       * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move):
>       Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS.
>
> gcc/testsuite/
>       * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test.
>       * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome.
>       * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise.
>       * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise.
>       * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc                 | 29 ++++++++--
>  .../gcc.target/aarch64/sve/cond_arith_6.c     |  3 +-
>  .../aarch64/sve/ldst_ptrue_128_to_neon.c      | 48 ++++++++++++++++
>  .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 +++++--------
>  .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 +++++--------
>  .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 56 +++++++------------
>  6 files changed, 118 insertions(+), 96 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c

OK, thanks.

Richard

Reply via email to