Kugan Vivekanandarajah <kvivekana...@nvidia.com> writes: > Hi, > > Fix for PR115258 cases a performance regression in some of the TSVC kernels > by adding additional mov instructions. > This patch fixes this. > > i.e., When operands are equal, it is likely that all of them get the same > register similar to: > (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117]) > (unspec:V2x16QI [ > (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) > (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) > ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi} > (nil)) > > In this case, aarch64_split_combinev16qi would split it with one insm. Hence, > when the operands are equal, split after reload. > > Bootstrapped and recession tested on aarch64-linux-gnu, Is this ok for trunk?
Thanks for the patch. I'm not sure this is the right fix though. I'm planning to have a look at the PR once stage 1 closes. Richard > > Thanks, > Kugan > > From ace50a5eb5d459901325ff17ada83791cef0a354 Mon Sep 17 00:00:00 2001 > From: Kugan <kvivekan...@nvidia.com> > Date: Wed, 23 Oct 2024 05:03:02 +0530 > Subject: [PATCH] [PATCH][AARCH64][PR115258]Fix excess moves > > When operands are equal, it is likely that all of them get the same register > similar to: > (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117]) > (unspec:V2x16QI [ > (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) > (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102]) > ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi} > (nil)) > > In this case, aarch64_split_combinev16qi would split it with one insn. Hence, > when the operands are equal, prefer splitting after reload. > > PR target/115258 > > gcc/ChangeLog: > > PR target/115258 > * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Restrict > the split before reload if operands are equal. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/pr115258-2.c: New test. > > Signed-off-by: Kugan Vivekanandarajah <kvivekana...@nvidia.com> > --- > gcc/config/aarch64/aarch64-simd.md | 2 +- > gcc/testsuite/gcc.target/aarch64/pr115258-2.c | 18 ++++++++++++++++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258-2.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 2a44aa3fcc3..e56100b3766 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -8525,7 +8525,7 @@ > UNSPEC_CONCAT))] > "TARGET_SIMD" > "#" > - "&& 1" > + "&& reload_completed || !rtx_equal_p (operands[1], operands[2])" > [(const_int 0)] > { > aarch64_split_combinev16qi (operands); > diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258-2.c > b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c > new file mode 100644 > index 00000000000..f28190cef32 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c > @@ -0,0 +1,18 @@ > + > +/* { dg-do compile } */ > +/* { dg-options "-Ofast -mcpu=neoverse-v2" } */ > + > +extern __attribute__((aligned(64))) float a[32000], b[32000]; > +int dummy(float[32000], float[32000], float); > + > +void s1112() { > + > + for (int nl = 0; nl < 100000 * 3; nl++) { > + for (int i = 32000 - 1; i >= 0; i--) { > + a[i] = b[i] + (float)1.; > + } > + dummy(a, b, 0.); > + } > +} > + > +/* { dg-final { scan-assembler-times "mov\\tv\[0-9\]+\.16b, v\[0-9\]+\.16b" > 2 } } */