Hi Richard, I want to follow up on this and see if you have a fix for this.
Thanks, Kugan > On 29 Oct 2024, at 9:41 pm, Richard Sandiford <richard.sandif...@arm.com> > wrote: > > External email: Use caution opening links or attachments > > > Kugan Vivekanandarajah <kvivekana...@nvidia.com> writes: >> Hi, >> >> Fix for PR115258 cases a performance regression in some of the TSVC kernels >> by adding additional mov instructions. >> This patch fixes this. >> >> i.e., When operands are equal, it is likely that all of them get the same >> register similar to: >> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117]) >> (unspec:V2x16QI [ >> (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) >> (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) >> ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi} >> (nil)) >> >> In this case, aarch64_split_combinev16qi would split it with one insm. >> Hence, when the operands are equal, split after reload. >> >> Bootstrapped and recession tested on aarch64-linux-gnu, Is this ok for trunk? > > Thanks for the patch. I'm not sure this is the right fix though. > I'm planning to have a look at the PR once stage 1 closes. > > Richard > >> >> Thanks, >> Kugan >> >> From ace50a5eb5d459901325ff17ada83791cef0a354 Mon Sep 17 00:00:00 2001 >> From: Kugan <kvivekan...@nvidia.com> >> Date: Wed, 23 Oct 2024 05:03:02 +0530 >> Subject: [PATCH] [PATCH][AARCH64][PR115258]Fix excess moves >> >> When operands are equal, it is likely that all of them get the same register >> similar to: >> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117]) >> (unspec:V2x16QI [ >> (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) >> (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) >> ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi} >> (nil)) >> >> In this case, aarch64_split_combinev16qi would split it with one insn. Hence, >> when the operands are equal, prefer splitting after reload. >> >> PR target/115258 >> >> gcc/ChangeLog: >> >> PR target/115258 >> * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Restrict >> the split before reload if operands are equal. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/pr115258-2.c: New test. >> >> Signed-off-by: Kugan Vivekanandarajah <kvivekana...@nvidia.com> >> --- >> gcc/config/aarch64/aarch64-simd.md | 2 +- >> gcc/testsuite/gcc.target/aarch64/pr115258-2.c | 18 ++++++++++++++++++ >> 2 files changed, 19 insertions(+), 1 deletion(-) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258-2.c >> >> diff --git a/gcc/config/aarch64/aarch64-simd.md >> b/gcc/config/aarch64/aarch64-simd.md >> index 2a44aa3fcc3..e56100b3766 100644 >> --- a/gcc/config/aarch64/aarch64-simd.md >> +++ b/gcc/config/aarch64/aarch64-simd.md >> @@ -8525,7 +8525,7 @@ >> UNSPEC_CONCAT))] >> "TARGET_SIMD" >> "#" >> - "&& 1" >> + "&& reload_completed || !rtx_equal_p (operands[1], operands[2])" >> [(const_int 0)] >> { >> aarch64_split_combinev16qi (operands); >> diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258-2.c >> b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c >> new file mode 100644 >> index 00000000000..f28190cef32 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c >> @@ -0,0 +1,18 @@ >> + >> +/* { dg-do compile } */ >> +/* { dg-options "-Ofast -mcpu=neoverse-v2" } */ >> + >> +extern __attribute__((aligned(64))) float a[32000], b[32000]; >> +int dummy(float[32000], float[32000], float); >> + >> +void s1112() { >> + >> + for (int nl = 0; nl < 100000 * 3; nl++) { >> + for (int i = 32000 - 1; i >= 0; i--) { >> + a[i] = b[i] + (float)1.; >> + } >> + dummy(a, b, 0.); >> + } >> +} >> + >> +/* { dg-final { scan-assembler-times "mov\\tv\[0-9\]+\.16b, v\[0-9\]+\.16b" >> 2 } } */