> On 28 Apr 2025, at 09:59, Richard Sandiford <richard.sandif...@arm.com> wrote: > > External email: Use caution opening links or attachments > > > Tamar Christina <tamar.christ...@arm.com> writes: >>> -----Original Message----- >>> From: Richard Sandiford <richard.sandif...@arm.com> >>> Sent: Friday, April 25, 2025 6:55 PM >>> To: Jennifer Schmitz <jschm...@nvidia.com> >>> Cc: gcc-patches@gcc.gnu.org >>> Subject: Re: [PATCH] AArch64: Fold LD1/ST1 with ptrue to LDR/STR for >>> 128-bit VLS >>> >>> Jennifer Schmitz <jschm...@nvidia.com> writes: >>>> If -msve-vector-bits=128, SVE loads and stores (LD1 and ST1) with a >>>> ptrue predicate can be replaced by neon instructions (LDR and STR), >>>> thus avoiding the predicate altogether. This also enables formation of >>>> LDP/STP pairs. >>>> >>>> For example, the test cases >>>> >>>> svfloat64_t >>>> ptrue_load (float64_t *x) >>>> { >>>> svbool_t pg = svptrue_b64 (); >>>> return svld1_f64 (pg, x); >>>> } >>>> void >>>> ptrue_store (float64_t *x, svfloat64_t data) >>>> { >>>> svbool_t pg = svptrue_b64 (); >>>> return svst1_f64 (pg, x, data); >>>> } >>>> >>>> were previously compiled to >>>> (with -O2 -march=armv8.2-a+sve -msve-vector-bits=128): >>>> >>>> ptrue_load: >>>> ptrue p3.b, vl16 >>>> ld1d z0.d, p3/z, [x0] >>>> ret >>>> ptrue_store: >>>> ptrue p3.b, vl16 >>>> st1d z0.d, p3, [x0] >>>> ret >>>> >>>> Now the are compiled to: >>>> >>>> ptrue_load: >>>> ldr q0, [x0] >>>> ret >>>> ptrue_store: >>>> str q0, [x0] >>>> ret >>>> >>>> The implementation includes the if-statement >>>> if (known_eq (BYTES_PER_SVE_VECTOR, 16) >>>> && known_eq (GET_MODE_SIZE (mode), 16)) >>>> >>>> which checks for 128-bit VLS and excludes partial modes with a >>>> mode size < 128 (e.g. VNx2QI). >>> >>> I think it would be better to use: >>> >>> if (known_eq (GET_MODE_SIZE (mode), 16) >>> && aarch64_classify_vector_mode (mode) == VEC_SVE_DATA >>> >>> to defend against any partial structure modes that might be added in future. >>> >> >> Hi Both, >> >> Just a suggestion so feel free to ignore, but I do wonder if this >> optimization shouldn't >> look at the predicate bits rather than the mode size. Since this is valid >> for any load >> where the predicate uses the lower N bits where n corresponds to an Adv. >> SIMD register >> size. >> >> e.g. it should be valid for: >> >> #include <arm_sve.h> >> >> svfloat64_t >> ptrue_load (float64_t *x) >> { >> svbool_t pg = svptrue_pat_b8 (SV_VL16); >> return svld1_f64 (pg, x); >> } >> >> void >> ptrue_store (float64_t *x, svfloat64_t data) >> { >> svbool_t pg = svptrue_pat_b8 (SV_VL16); >> return svst1_f64 (pg, x, data); >> } >> >> In general, along with >> >> #include <arm_sve.h> >> >> svfloat64_t >> ptrue_load (float64_t *x) >> { >> svbool_t pg = svptrue_pat_b8 (SV_VL8); >> return svld1_f64 (pg, x); >> } >> >> void >> ptrue_store (float64_t *x, svfloat64_t data) >> { >> svbool_t pg = svptrue_pat_b8 (SV_VL8); >> return svst1_f64 (pg, x, data); >> } >> >> It just so happens that at VL128 the SV_VL16 == SV_ALL. Looking at the >> predicate bits >> instead would help optimize all codegen. > > Yeah, I agree we should do that at some point too.
Hello Richard and Tamar, thanks for reviewing my patch. I think Tamar’s suggestion goes in the direction of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117978, doesn’t it? I attached a patch proposal a few weeks ago that I was planning to submit after this patch was reviewed. So, my current plan was to make the changes to this patch as proposed above by Richard and then submit the follow-up patch soon. Is that alright? Thanks, Jennifer > But > aarch64_emit_sve_pred_move isn't as general as its name suggests, > in that the provided predicate is always a ptrue: > > /* Emit an SVE predicated move from SRC to DEST. PRED is a predicate > that is known to contain PTRUE. */ > > That is, this function is used by code that wants to emit a full vector > move. The reason that the predicate is passed in rather than being > generated internally is because of aarch64_sve_reload_mem and the > post-reload splitters. > > Thanks, > Richard
smime.p7s
Description: S/MIME cryptographic signature