Hi Richard,

I want to follow up on this and see if you have a fix for this.

Thanks,
Kugan

> On 29 Oct 2024, at 9:41 pm, Richard Sandiford <richard.sandif...@arm.com> 
> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Kugan Vivekanandarajah <kvivekana...@nvidia.com> writes:
>> Hi,
>> 
>> Fix for PR115258 cases a performance regression in some of the TSVC kernels 
>> by adding additional mov instructions.
>> This patch fixes this.
>> 
>> i.e., When operands are equal, it is likely that all of them get the same 
>> register similar to:
>> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117])
>>        (unspec:V2x16QI [
>>                (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>>                (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>>            ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi}
>>     (nil))
>> 
>> In this case, aarch64_split_combinev16qi would split it with one insm. 
>> Hence, when the operands are equal, split after reload.
>> 
>> Bootstrapped and recession tested on aarch64-linux-gnu, Is this ok for trunk?
> 
> Thanks for the patch.  I'm not sure this is the right fix though.
> I'm planning to have a look at the PR once stage 1 closes.
> 
> Richard
> 
>> 
>> Thanks,
>> Kugan
>> 
>> From ace50a5eb5d459901325ff17ada83791cef0a354 Mon Sep 17 00:00:00 2001
>> From: Kugan <kvivekan...@nvidia.com>
>> Date: Wed, 23 Oct 2024 05:03:02 +0530
>> Subject: [PATCH] [PATCH][AARCH64][PR115258]Fix excess moves
>> 
>> When operands are equal, it is likely that all of them get the same register
>> similar to:
>> (insn 19 15 20 3 (set (reg:V2x16QI 62 v30 [117])
>>        (unspec:V2x16QI [
>>                (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>>                (reg:V16QI 62 v30 [orig:102 vect__1.7 ] [102])
>>            ] UNSPEC_CONCAT)) "tsvc.c":11:12 4871 {aarch64_combinev16qi}
>>     (nil))
>> 
>> In this case, aarch64_split_combinev16qi would split it with one insn. Hence,
>> when the operands are equal, prefer splitting after reload.
>> 
>>      PR target/115258
>> 
>> gcc/ChangeLog:
>> 
>>      PR target/115258
>>      * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Restrict
>>      the split before reload if operands are equal.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.target/aarch64/pr115258-2.c: New test.
>> 
>> Signed-off-by: Kugan Vivekanandarajah <kvivekana...@nvidia.com>
>> ---
>> gcc/config/aarch64/aarch64-simd.md            |  2 +-
>> gcc/testsuite/gcc.target/aarch64/pr115258-2.c | 18 ++++++++++++++++++
>> 2 files changed, 19 insertions(+), 1 deletion(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258-2.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 2a44aa3fcc3..e56100b3766 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -8525,7 +8525,7 @@
>>                      UNSPEC_CONCAT))]
>>   "TARGET_SIMD"
>>   "#"
>> -  "&& 1"
>> +  "&& reload_completed || !rtx_equal_p (operands[1], operands[2])"
>>   [(const_int 0)]
>> {
>>   aarch64_split_combinev16qi (operands);
>> diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258-2.c 
>> b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c
>> new file mode 100644
>> index 00000000000..f28190cef32
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/pr115258-2.c
>> @@ -0,0 +1,18 @@
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-Ofast -mcpu=neoverse-v2" } */
>> +
>> +extern __attribute__((aligned(64))) float a[32000], b[32000];
>> +int dummy(float[32000], float[32000], float);
>> +
>> +void s1112() {
>> +
>> +  for (int nl = 0; nl < 100000 * 3; nl++) {
>> +    for (int i = 32000 - 1; i >= 0; i--) {
>> +      a[i] = b[i] + (float)1.;
>> +    }
>> +    dummy(a, b, 0.);
>> +  }
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "mov\\tv\[0-9\]+\.16b, v\[0-9\]+\.16b" 
>> 2 } } */


Reply via email to