Kugan Vivekanandarajah <kvivekana...@nvidia.com> writes:
> Hi,
>
> Fix for PR115258 cases a performance regression in some of the TSVC kernels 
> by adding additional mov instructions.
> This patch fixes this. 
>
> i.e., When operands are equal, it is likely that all of them get the same 
> register similar to:
> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117])
>         (unspec:V2x16QI [
>                 (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>                 (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>             ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi}
>      (nil))
>
> In this case, aarch64_split_combinev16qi would split it with one insm. Hence, 
> when the operands are equal, split after reload.
>
> Bootstrapped and recession tested on aarch64-linux-gnu, Is this ok for trunk?

Thanks for the patch.  I'm not sure this is the right fix though.
I'm planning to have a look at the PR once stage 1 closes.

Richard

>
> Thanks,
> Kugan
>
> From ace50a5eb5d459901325ff17ada83791cef0a354 Mon Sep 17 00:00:00 2001
> From: Kugan <kvivekan...@nvidia.com>
> Date: Wed, 23 Oct 2024 05:03:02 +0530
> Subject: [PATCH] [PATCH][AARCH64][PR115258]Fix excess moves
>
> When operands are equal, it is likely that all of them get the same register
> similar to:
> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117])
>         (unspec:V2x16QI [
>                 (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>                 (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>             ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi}
>      (nil))
>
> In this case, aarch64_split_combinev16qi would split it with one insn. Hence,
> when the operands are equal, prefer splitting after reload.
>
>       PR target/115258
>
> gcc/ChangeLog:
>
>       PR target/115258
>       * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Restrict
>       the split before reload if operands are equal.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/pr115258-2.c: New test.
>
> Signed-off-by: Kugan Vivekanandarajah <kvivekana...@nvidia.com>
> ---
>  gcc/config/aarch64/aarch64-simd.md            |  2 +-
>  gcc/testsuite/gcc.target/aarch64/pr115258-2.c | 18 ++++++++++++++++++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258-2.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 2a44aa3fcc3..e56100b3766 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -8525,7 +8525,7 @@
>                       UNSPEC_CONCAT))]
>    "TARGET_SIMD"
>    "#"
> -  "&& 1"
> +  "&& reload_completed || !rtx_equal_p (operands[1], operands[2])"
>    [(const_int 0)]
>  {
>    aarch64_split_combinev16qi (operands);
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258-2.c 
> b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c
> new file mode 100644
> index 00000000000..f28190cef32
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c
> @@ -0,0 +1,18 @@
> +
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mcpu=neoverse-v2" } */
> +
> +extern __attribute__((aligned(64))) float a[32000], b[32000];
> +int dummy(float[32000], float[32000], float);
> +
> +void s1112() {
> +
> +  for (int nl = 0; nl < 100000 * 3; nl++) {
> +    for (int i = 32000 - 1; i >= 0; i--) {
> +      a[i] = b[i] + (float)1.;
> +    }
> +    dummy(a, b, 0.);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-times "mov\\tv\[0-9\]+\.16b, v\[0-9\]+\.16b" 
> 2 } } */

Reply via email to