> On 28 Apr 2025, at 15:35, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Kyrylo Tkachov <ktkac...@nvidia.com> writes: >>> On 25 Apr 2025, at 19:55, Richard Sandiford <richard.sandif...@arm.com> >>> wrote: >>> >>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a >>>> ptrue predicate can be replaced by neon instructions (LDR and STR), >>>> thus avoiding the predicate altogether. This also enables formation of >>>> LDP/STP pairs. >>>> >>>> For example, the test cases >>>> >>>> svfloat64_t >>>> ptrue_load (float64_t *x) >>>> { >>>> svbool_t pg = svptrue_b64 (); >>>> return svld1_f64 (pg, x); >>>> } >>>> void >>>> ptrue_store (float64_t *x, svfloat64_t data) >>>> { >>>> svbool_t pg = svptrue_b64 (); >>>> return svst1_f64 (pg, x, data); >>>> } >>>> >>>> were previously compiled to >>>> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128): >>>> >>>> ptrue_load: >>>> ptrue p3.b, vl16 >>>> ld1d z0.d, p3/z, [x0] >>>> ret >>>> ptrue_store: >>>> ptrue p3.b, vl16 >>>> st1d z0.d, p3, [x0] >>>> ret >>>> >>>> Now the are compiled to: >>>> >>>> ptrue_load: >>>> ldr q0, [x0] >>>> ret >>>> ptrue_store: >>>> str q0, [x0] >>>> ret >>>> >>>> The implementation includes the if-statement >>>> if (known_eq (BYTES_PER_SVE_VECTOR, 16) >>>> && known_eq (GET_MODE_SIZE (mode), 16)) >>>> >>>> which checks for 128-bit VLS and excludes partial modes with a >>>> mode size < 128 (e.g. VNx2QI). >>> >>> I think it would be better to use: >>> >>> if (known_eq (GET_MODE_SIZE (mode), 16) >>> && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA >>> >>> to defend against any partial structure modes that might be added in future. Done. >>> >>>> >>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. >>>> OK for mainline? >>>> >>>> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >>>> >>>> gcc/ >>>> * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move): >>>> Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS. >>>> >>>> gcc/testsuite/ >>>> * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test. >>>> * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome. >>>> * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise. >>>> * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise. >>>> * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise. >>>> --- >>>> gcc/config/aarch64/aarch64.cc | 27 ++++++-- >>>> .../gcc.target/aarch64/sve/cond_arith_6.c | 3 +- >>>> .../aarch64/sve/ldst_ptrue_128_to_neon.c | 36 +++++++++++ >>>> .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 ++++------- >>>> .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 ++++------- >>>> .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 64 +++++-------------- >>>> 6 files changed, 102 insertions(+), 106 deletions(-) >>>> create mode 100644 >>>> gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c >>>> >>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc >>>> index f7bccf532f8..ac01149276b 100644 >>>> --- a/gcc/config/aarch64/aarch64.cc >>>> +++ b/gcc/config/aarch64/aarch64.cc >>>> @@ -6416,13 +6416,28 @@ aarch64_stack_protect_canary_mem (machine_mode >>>> mode, rtx decl_rtl, >>>> void >>>> aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src) >>>> { >>>> - expand_operand ops[3]; >>>> machine_mode mode = GET_MODE (dest); >>>> - create_output_operand (&ops[0], dest, mode); >>>> - create_input_operand (&ops[1], pred, GET_MODE(pred)); >>>> - create_input_operand (&ops[2], src, mode); >>>> - temporary_volatile_ok v (true); >>>> - expand_insn (code_for_aarch64_pred_mov (mode), 3, ops); >>>> + if ((MEM_P (dest) || MEM_P (src)) >>>> + && known_eq (BYTES_PER_SVE_VECTOR, 16) >>>> + && known_eq (GET_MODE_SIZE (mode), 16) >>>> + && !BYTES_BIG_ENDIAN) >>>> + { >>>> + rtx tmp = gen_reg_rtx (V16QImode); >>>> + emit_move_insn (tmp, lowpart_subreg (V16QImode, src, mode)); >>>> + if (MEM_P (src)) >>>> + emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode)); >>>> + else >>>> + emit_move_insn (adjust_address (dest, V16QImode, 0), tmp); >>> >>> We shouldn't usually need a temporary register for the store case. >>> Also, using lowpart_subreg for a source memory leads to the best-avoided >>> subregs of mems when the mem is volatile, due to: >>> >>> /* Allow splitting of volatile memory references in case we don't >>> have instruction to move the whole thing. */ >>> && (! MEM_VOLATILE_P (op) >>> || ! have_insn_for (SET, innermode)) >>> >>> in simplify_subreg. So how about: >>> >>> if (MEM_P (src)) >>> { >>> rtx tmp = force_reg (V16QImode, adjust_address (src, V16QImode, 0)); >>> emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode)); >>> } >>> else >>> emit_move_insn (adjust_address (dest, V16QImode, 0), >>> force_lowpart_subreg (V16QImode, src, mode)); Thanks, done. >>> >>> It might be good to test the volatile case too. That case does work >>> with your patch, since the subreg gets ironed out later. It's just for >>> completeness. Added tests with volatile memory references.
Below is the updated test, I bootstrapped and tested the updated version. Thanks, Jennifer If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a ptrue predicate can be replaced by neon instructions (LDR and STR), thus avoiding the predicate altogether. This also enables formation of LDP/STP pairs. For example, the test cases svfloat64_t ptrue_load (float64_t *x) { svbool_t pg = svptrue_b64 (); return svld1_f64 (pg, x); } void ptrue_store (float64_t *x, svfloat64_t data) { svbool_t pg = svptrue_b64 (); return svst1_f64 (pg, x, data); } were previously compiled to (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128): ptrue_load: ptrue p3.b, vl16 ld1d z0.d, p3/z, [x0] ret ptrue_store: ptrue p3.b, vl16 st1d z0.d, p3, [x0] ret Now the are compiled to: ptrue_load: ldr q0, [x0] ret ptrue_store: str q0, [x0] ret The implementation includes the if-statement if (known_eq (BYTES_PER_SVE_VECTOR, 16) && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA) which checks for 128-bit VLS and excludes partial modes with a mode size < 128 (e.g. VNx2QI). The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_emit_sve_pred_move): Fold LD1/ST1 with ptrue to LDR/STR for 128-bit VLS. gcc/testsuite/ * gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c: New test. * gcc.target/aarch64/sve/cond_arith_6.c: Adjust expected outcome. * gcc.target/aarch64/sve/pst/return_4_128.c: Likewise. * gcc.target/aarch64/sve/pst/return_5_128.c: Likewise. * gcc.target/aarch64/sve/pst/struct_3_128.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 29 +++++++-- .../gcc.target/aarch64/sve/cond_arith_6.c | 3 +- .../aarch64/sve/ldst_ptrue_128_to_neon.c | 48 ++++++++++++++ .../gcc.target/aarch64/sve/pcs/return_4_128.c | 39 ++++------- .../gcc.target/aarch64/sve/pcs/return_5_128.c | 39 ++++------- .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 64 +++++-------------- 6 files changed, 116 insertions(+), 106 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f7bccf532f8..1c06b8528e9 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -6416,13 +6416,30 @@ aarch64_stack_protect_canary_mem (machine_mode mode, rtx decl_rtl, void aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src) { - expand_operand ops[3]; machine_mode mode = GET_MODE (dest); - create_output_operand (&ops[0], dest, mode); - create_input_operand (&ops[1], pred, GET_MODE(pred)); - create_input_operand (&ops[2], src, mode); - temporary_volatile_ok v (true); - expand_insn (code_for_aarch64_pred_mov (mode), 3, ops); + if ((MEM_P (dest) || MEM_P (src)) + && known_eq (BYTES_PER_SVE_VECTOR, 16) + && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA + && !BYTES_BIG_ENDIAN) + { + if (MEM_P (src)) + { + rtx tmp = force_reg (V16QImode, adjust_address (src, V16QImode, 0)); + emit_move_insn (dest, lowpart_subreg (mode, tmp, V16QImode)); + } + else + emit_move_insn (adjust_address (dest, V16QImode, 0), + force_lowpart_subreg (V16QImode, src, mode)); + } + else + { + expand_operand ops[3]; + create_output_operand (&ops[0], dest, mode); + create_input_operand (&ops[1], pred, GET_MODE(pred)); + create_input_operand (&ops[2], src, mode); + temporary_volatile_ok v (true); + expand_insn (code_for_aarch64_pred_mov (mode), 3, ops); + } } /* Expand a pre-RA SVE data move from SRC to DEST in which at least one diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c index 4085ab12444..d5a12f1df07 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_arith_6.c @@ -8,7 +8,8 @@ f (float *x) x[i] -= 1.0f; } -/* { dg-final { scan-assembler {\tld1w\tz} } } */ +/* { dg-final { scan-assembler {\tld1w\tz} { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler {\tldr\tq} { target aarch64_little_endian } } } */ /* { dg-final { scan-assembler {\tfcmgt\tp} } } */ /* { dg-final { scan-assembler {\tfsub\tz} } } */ /* { dg-final { scan-assembler {\tst1w\tz} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c b/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c new file mode 100644 index 00000000000..43d36e86ad9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/ldst_ptrue_128_to_neon.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=128" } */ +/* { dg-require-effective-target aarch64_little_endian } */ + +#include <arm_sve.h> + +#define TEST(TYPE, TY, B) \ + sv##TYPE \ + ld1_##TY##B (TYPE *x) \ + { \ + svbool_t pg = svptrue_b##B (); \ + return svld1_##TY##B (pg, x); \ + } \ + \ + void \ + st1_##TY##B (TYPE *x, sv##TYPE data) \ + { \ + svbool_t pg = svptrue_b##B (); \ + svst1_##TY##B (pg, x, data); \ + } \ + \ + sv##TYPE \ + ld1_vol_##TY##B (volatile sv##TYPE *ptr) \ + { \ + return *ptr; \ + } \ + \ + void \ + st1_vol_##TY##B (volatile sv##TYPE *ptr, sv##TYPE x) \ + { \ + *ptr = x; \ + } + +TEST (bfloat16_t, bf, 16) +TEST (float16_t, f, 16) +TEST (float32_t, f, 32) +TEST (float64_t, f, 64) +TEST (int8_t, s, 8) +TEST (int16_t, s, 16) +TEST (int32_t, s, 32) +TEST (int64_t, s, 64) +TEST (uint8_t, u, 8) +TEST (uint16_t, u, 16) +TEST (uint32_t, u, 32) +TEST (uint64_t, u, 64) + +/* { dg-final { scan-assembler-times {\tldr\tq0, \[x0\]} 24 } } */ +/* { dg-final { scan-assembler-times {\tstr\tq0, \[x0\]} 24 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c index 87d528c84cd..ac5f981490a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_4_128.c @@ -11,104 +11,91 @@ /* ** callee_s8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s8, __SVInt8_t) /* ** callee_u8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u8, __SVUint8_t) /* ** callee_mf8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (mf8, __SVMfloat8_t) /* ** callee_s16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s16, __SVInt16_t) /* ** callee_u16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u16, __SVUint16_t) /* ** callee_f16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f16, __SVFloat16_t) /* ** callee_bf16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (bf16, __SVBfloat16_t) /* ** callee_s32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s32, __SVInt32_t) /* ** callee_u32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u32, __SVUint32_t) /* ** callee_f32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f32, __SVFloat32_t) /* ** callee_s64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s64, __SVInt64_t) /* ** callee_u64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u64, __SVUint64_t) /* ** callee_f64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f64, __SVFloat64_t) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c index 347a16c1367..2fab6feb41c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/return_5_128.c @@ -13,104 +13,91 @@ /* ** callee_s8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s8, svint8_t) /* ** callee_u8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u8, svuint8_t) /* ** callee_mf8: -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (mf8, svmfloat8_t) /* ** callee_s16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s16, svint16_t) /* ** callee_u16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u16, svuint16_t) /* ** callee_f16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f16, svfloat16_t) /* ** callee_bf16: -** ptrue (p[0-7])\.b, vl16 -** ld1h z0\.h, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (bf16, svbfloat16_t) /* ** callee_s32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s32, svint32_t) /* ** callee_u32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u32, svuint32_t) /* ** callee_f32: -** ptrue (p[0-7])\.b, vl16 -** ld1w z0\.s, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f32, svfloat32_t) /* ** callee_s64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (s64, svint64_t) /* ** callee_u64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (u64, svuint64_t) /* ** callee_f64: -** ptrue (p[0-7])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ CALLEE (f64, svfloat64_t) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c index d99ce1202a9..370bd9e3bfe 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c @@ -473,17 +473,11 @@ SEL2 (struct, pst_uniform4) ** sub sp, sp, #144 ** add (x[0-9]+), sp, #?31 ** and x7, \1, #?(?:-32|4294967264) -** ptrue (p[0-7])\.b, vl16 -** st1w z0\.s, \2, \[x7\] -** add (x[0-9]+), x7, #?32 -** ( -** str z1, \[\3\] -** str z2, \[\3, #1, mul vl\] -** | -** stp q1, q2, \[\3\] -** ) -** str z3, \[\3, #2, mul vl\] -** st1w z4\.s, \2, \[x7, #6, mul vl\] +** mov x0, x7 +** str q0, \[x0\], 32 +** stp q1, q2, \[x0\] +** str z3, \[x0, #2, mul vl\] +** str q4, \[x7, 96\] ** add sp, sp, #?144 ** ret */ @@ -516,20 +510,12 @@ SEL2 (struct, pst_mixed1) ** test_pst_mixed1: ** sub sp, sp, #176 ** str p0, \[sp\] -** ptrue p0\.b, vl16 -** st1h z0\.h, p0, \[sp, #1, mul vl\] -** st1h z1\.h, p0, \[sp, #2, mul vl\] -** st1w z2\.s, p0, \[sp, #3, mul vl\] -** st1d z3\.d, p0, \[sp, #4, mul vl\] +** stp q0, q1, \[sp, 16\] +** stp q2, q3, \[sp, 48\] ** str p1, \[sp, #40, mul vl\] ** str p2, \[sp, #41, mul vl\] -** st1b z4\.b, p0, \[sp, #6, mul vl\] -** st1h z5\.h, p0, \[sp, #7, mul vl\] -** ... -** st1w z6\.s, p0, [^\n]* -** ... -** st1d z7\.d, p0, [^\n]* -** ... +** stp q4, q5, \[sp, 96\] +** stp q6, q7, \[sp, 128\] ** str p3, \[sp, #80, mul vl\] ** mov (x7, sp|w7, wsp) ** add sp, sp, #?176 @@ -557,24 +543,13 @@ SEL2 (struct, pst_mixed2) ** test_pst_mixed2: ** sub sp, sp, #128 ** str p0, \[sp\] -** ptrue (p[03])\.b, vl16 -** add (x[0-9]+), sp, #?2 -** st1b z0\.b, \1, \[\2\] +** str q0, \[sp, 2\] ** str p1, \[sp, #9, mul vl\] -** add (x[0-9]+), sp, #?20 -** st1b z1\.b, \1, \[\3\] +** str q1, \[sp, 20\] ** str p2, \[sp, #18, mul vl\] -** add (x[0-9]+), sp, #?38 -** st1b z2\.b, \1, \[\4\] -** ( -** str z3, \[sp, #4, mul vl\] -** str z4, \[sp, #5, mul vl\] -** str z5, \[sp, #6, mul vl\] -** str z6, \[sp, #7, mul vl\] -** | +** str q2, \[sp, 38\] ** stp q3, q4, \[sp, 64\] ** stp q5, q6, \[sp, 96\] -** ) ** mov (x7, sp|w7, wsp) ** add sp, sp, #?128 ** ret @@ -595,8 +570,7 @@ SEL2 (struct, pst_big1) /* ** test_pst_big1_a: { target lp64 } -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ /* @@ -760,8 +734,7 @@ test_pst_big3_d (struct pst_big3 x) /* ** test_pst_big3_e: { target lp64 } -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0, #1, mul vl\] +** ldr q0, \[x0, 16\] ** ret */ /* @@ -780,8 +753,7 @@ test_pst_big3_e (struct pst_big3 x) /* ** test_pst_big3_f: { target lp64 } -** ptrue (p[0-7])\.b, vl16 -** ld1b z0\.b, \1/z, \[x0, #5, mul vl\] +** ldr q0, \[x0, 80\] ** ret */ /* @@ -1035,8 +1007,7 @@ SEL2 (struct, nonpst6) /* ** test_nonpst6: { target lp64 } -** ptrue (p[0-3])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ /* @@ -1063,8 +1034,7 @@ SEL2 (struct, nonpst7) /* ** test_nonpst7: { target lp64 } -** ptrue (p[0-3])\.b, vl16 -** ld1d z0\.d, \1/z, \[x0\] +** ldr q0, \[x0\] ** ret */ /* -- 2.34.1 >> >> I don’t disagree with the suggestion here, just an observation on testing. >> >> Out of interest I tried adding volatile in godbolt and got a warning about >> discarded volatile qualifiers in C: >> https://godbolt.org/z/9bj4W39MP >> And an error in C++ about invalid conversion >> https://godbolt.org/z/7vf3r58ja >> >> How would you recommend a test is written here? > > When doing the review, I just used pointer dereferencing: > > svint8_t f1(volatile svint8_t *ptr) { return *ptr; } > void f2(volatile svint8_t *ptr, svint8_t x) { *ptr = x; } > > Thanks, > Richard
smime.p7s
Description: S/MIME cryptographic signature