[PATCH] rs6000: Add new pass for replacement of contiguous adresses vector load lxv with lxvp

2023-09-29 Thread Ajit Agarwal
Hello All: This patch add new pass to replace contiguous addresses vector load lxv with mma instruction lxvp. Bootstrapped and regtested with powepc64-linux-gnu. Thanks & Regards Ajit rs6000: Add new pass for replacement of contiguous lxv with lxvp New pass to replace contiguous addresses vec

[PATCH v1] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-06 Thread Ajit Agarwal
Hello All: This patch add new pass to replace contiguous addresses vector load lxv with mma instruction lxvp. Bootstrapped and regtested with powepc64-linux-gnu. Thanks & Regards Ajit rs6000: Add new pass for replacement of contiguous lxv with lxvp. New pass to replace contiguous addresses l

[PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-07 Thread Ajit Agarwal
Hello All: This patch add new pass to replace contiguous addresses vector load lxv with mma instruction lxvp. This patch addresses one regressions failure in ARM architecture. Bootstrapped and regtested with powepc64-linux-gnu. Thanks & Regards Ajit rs6000: Add new pass for replacement of con

PATCH v3] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-10-10 Thread Ajit Agarwal
Hello Segher: Here is the patch that uses xxlor instead of fmr where possible. Performance results shows that fmr is better in power9 and power10 architectures whereas xxlor is better in power7 and power 8 architectures. fmr is the only option before p7. Incorporated review comments. Bootstrapp

[PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-12 Thread Ajit Agarwal
This patch improves code sinking pass to sink statements before call to reduce register pressure. Review comments are incorporated. Synced and modified with latest trunk sources. For example : void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int l; l = a + b + c + d +e

[PATCH v9] Improve code sinking pass

2023-10-12 Thread Ajit Agarwal
This patch improves code sinking pass to sink statements before call to reduce register pressure. Review comments are incorporated. Synced with latest sources and modify the code changes accordingly. For example : void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int l;

[PATCH v9] tree-ssa-sink: Improve code sinking pass

2023-10-12 Thread Ajit Agarwal
This patch improves code sinking pass to sink statements before call to reduce register pressure. Review comments are incorporated. Synced with latest trunk sources and modify the sinking pass accordingly. For example : void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int

[PING ^0][PATCH v3] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-10-15 Thread Ajit Agarwal
Hello Segher: Please review. Thanks & Regards Ajit Forwarded Message Subject: PATCH v3] rs6000: fmr gets used instead of faster xxlor [PR93571] Date: Tue, 10 Oct 2023 18:14:00 +0530 From: Ajit Agarwal To: gcc-patches CC: Segher Boessenkool , Peter Bergner , Kewen

[PING ^0][PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-15 Thread Ajit Agarwal
Hello All: Please review. Thanks & Regards Ajit Forwarded Message Subject: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp Date: Sun, 8 Oct 2023 00:34:27 +0530 From: Ajit Agarwal To: gcc-patches CC: Segher Boessen

[PING ^0] [PATCH v8 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-15 Thread Ajit Agarwal
0530 From: Ajit Agarwal To: gcc-patches CC: Jeff Law , Vineet Gupta , Richard Biener , Segher Boessenkool , Peter Bergner Hello All: This version 8 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. Incorporated all

[PING ^0] [PATCH v2 3/4] Improve functionality of ree pass with various constants with AND operation.

2023-10-15 Thread Ajit Agarwal
sage Subject: [PATCH v2 3/4] Improve functionality of ree pass with various constants with AND operation. Date: Tue, 19 Sep 2023 14:51:16 +0530 From: Ajit Agarwal To: gcc-patches CC: Jeff Law , Vineet Gupta , Richard Biener , Peter Bergner , Segher Boessenkool Hello Jeff: This patch elimin

Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Ajit Agarwal
Hello Richard: On 17/10/23 2:03 pm, Richard Biener wrote: > On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal wrote: >> >> This patch improves code sinking pass to sink statements before call to >> reduce >> register pressure. >> Review comments are incorporated

[PATCH v10] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Ajit Agarwal
Currently, code sinking will sink code at the use points with loop having same nesting depth. The following patch improves code sinking by placing the sunk code in immediate dominator with same loop nest depth. Review comments are incorporated. For example : void bar(); int j; void foo(int a, in

Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Ajit Agarwal
Hello Richard: Below review comments are incorporated in version 10 of the patch, Please review and let me know if its okay for trunk. Thanks & Regards Ajit On 17/10/23 2:47 pm, Richard Biener wrote: > On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal wrote: >> >> Hello Richard

[PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-19 Thread Ajit Agarwal
Hello All: This version 9 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 9) of the patch following review comments are incorporated. a) Removal of hard code zero_extend and sign_extend i

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-21 Thread Ajit Agarwal
Hello Vineet: Thanks for your time and valuable comments. On 21/10/23 5:26 am, Vineet Gupta wrote: > On 10/19/23 23:50, Ajit Agarwal wrote: >> Hello All: >> >> This version 9 of the patch uses abi interfaces to remove zero and sign >> extension elimination. >&

[PATCH v10 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-21 Thread Ajit Agarwal
Hello Vineet and Jeff: This version 10 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 9) of the patch following review comments are incorporated. a) Removal of hard code zero_extend and s

[PATCH V11] ree: Improve ree pass using defined abi interfaces

2023-10-22 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 11 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 11) of the patch following review comments are incorporated. a) Removal of hard code zero_e

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-22 Thread Ajit Agarwal
Hello All: Addressed below review comments in the version 11 of the patch. Please review and please let me know if its ok for trunk. Thanks & Regards Ajit On 22/10/23 12:56 am, rep.dot@gmail.com wrote: > On 21 October 2023 01:56:16 CEST, Vineet Gupta wrote: >> On 10/19/2

[PING ^1][PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-10-23 Thread Ajit Agarwal
Ping ^1. Forwarded Message Subject: [PING ^0][PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp Date: Sun, 15 Oct 2023 17:43:24 +0530 From: Ajit Agarwal To: gcc-patches CC: Segher Boessenkool , Kewen.Lin , Peter Bergner

[PATCH V12 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-23 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 11 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 11) of the patch following review comments are incorporated. a) Removal of hard code zero_e

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal
Hello Bernhard: On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote: > On Mon, 23 Oct 2023 12:16:18 +0530 > Ajit Agarwal wrote: > >> Hello All: >> >> Addressed below review comments in the version 11 of the patch. >> Please review and please let me know if it

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet: On 24/10/23 12:02 am, Vineet Gupta wrote: > > > On 10/22/23 23:46, Ajit Agarwal wrote: >> Hello All: >> >> Addressed below review comments in the version 11 of the patch. >> Please review and please let me know if its ok for trunk. >> >>

[PATCH V13] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 13 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 13) of the patch following review comments are incorporated. a) Removal of hard code zero_e

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-24 Thread Ajit Agarwal
On 24/10/23 1:10 pm, Ajit Agarwal wrote: > Hello Vineet: > > On 24/10/23 12:02 am, Vineet Gupta wrote: >> >> >> On 10/22/23 23:46, Ajit Agarwal wrote: >>> Hello All: >>> >>> Addressed below review comments in the version 11 of the patch.

Re: PATCH v6 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-10-24 Thread Ajit Agarwal
On 19/09/23 1:57 am, Vineet Gupta wrote: > Hi Ajit, > > On 9/17/23 22:59, Ajit Agarwal wrote: >> This new version of patch 6 use improve ree pass for rs6000 target using >> defined ABI interfaces. >> Bootstrapped and regtested on power64-linux-gnu. >>

[PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-24 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 14 of the patch uses abi interfaces to remove zero and sign extension elimination. This fixes aarch64 regressions failures with aggressive CSE. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 14) of the patch following revi

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-25 Thread Ajit Agarwal
On 25/10/23 2:19 am, Vineet Gupta wrote: > On 10/24/23 13:36, rep.dot@gmail.com wrote: >> As said, I don't see why the below was not cleaned up before the V1 >> submission. >> Iff it breaks when manually CSEing, I'm curious why? The function below looks identical in v12 of

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-25 Thread Ajit Agarwal
On 25/10/23 2:06 am, rep.dot@gmail.com wrote: > On 24 October 2023 09:36:22 CEST, Ajit Agarwal wrote: >> Hello Bernhard: >> >> On 23/10/23 7:40 pm, Bernhard Reutner-Fischer wrote: >>> On Mon, 23 Oct 2023 12:16:18 +0530 >>> Ajit Agarwal wrote: >&g

Re: [PATCH V14 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-25 Thread Ajit Agarwal
On 24/10/23 11:47 pm, Vineet Gupta wrote: > > > On 10/24/23 10:03, Ajit Agarwal wrote: >> Hello Vineet, Jeff and Bernhard: >> >> This version 14 of the patch uses abi interfaces to remove zero and sign >> extension elimination. >> This fixes aarch64 r

[PATCH V15 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-28 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 15 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 15) of the patch following review comments are incorporated. a) Removal of hard code zero_e

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-28 Thread Ajit Agarwal
On 27/10/23 10:46 pm, Bernhard Reutner-Fischer wrote: > On Wed, 25 Oct 2023 16:41:07 +0530 > Ajit Agarwal wrote: > >> On 25/10/23 2:19 am, Vineet Gupta wrote: >>> On 10/24/23 13:36, rep.dot@gmail.com wrote: >>>>>>>> As said, I don'

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-28 Thread Ajit Agarwal
On 28/10/23 4:09 am, Vineet Gupta wrote: > > > On 10/27/23 10:16, Bernhard Reutner-Fischer wrote: >> On Wed, 25 Oct 2023 16:41:07 +0530 >> Ajit Agarwal wrote: >> >>> On 25/10/23 2:19 am, Vineet Gupta wrote: >>>> On 10/24/23 13:36, rep.dot

[PATCH V15 4/4] ree: Improve ree pass using defined abi interfaces

2023-10-29 Thread Ajit Agarwal
Hello Vineet, Jeff and Bernhard: This version 15 of the patch uses abi interfaces to remove zero and sign extension elimination. Bootstrapped and regtested on powerpc-linux-gnu. In this version (version 15) of the patch following review comments are incorporated. a) Removal of hard code zero_e

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-29 Thread Ajit Agarwal
On 28/10/23 3:55 pm, Ajit Agarwal wrote: > > > On 27/10/23 10:46 pm, Bernhard Reutner-Fischer wrote: >> On Wed, 25 Oct 2023 16:41:07 +0530 >> Ajit Agarwal wrote: >> >>> On 25/10/23 2:19 am, Vineet Gupta wrote: >>>> On 10/24/23 13:36, rep.dot..

Re: [PATCH v9 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces

2023-10-29 Thread Ajit Agarwal
On 28/10/23 3:56 pm, Ajit Agarwal wrote: > > > On 28/10/23 4:09 am, Vineet Gupta wrote: >> >> >> On 10/27/23 10:16, Bernhard Reutner-Fischer wrote: >>> On Wed, 25 Oct 2023 16:41:07 +0530 >>> Ajit Agarwal wrote: >>> >>>>

[PATCH V11] : tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal
Hello Richard: Currently, code sinking will sink code at the use points with loop having same nesting depth. The following patch improves code sinking by placing the sunk code in immediate dominator with same loop nest depth. Review comments are incorporated. For example : void bar(); int j; vo

Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal
Hello Richard: On 17/10/23 2:47 pm, Richard Biener wrote: > On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal wrote: >> >> Hello Richard: >> >> On 17/10/23 2:03 pm, Richard Biener wrote: >>> On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal >>> wrote: >&

Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-30 Thread Ajit Agarwal
On 30/10/23 5:51 pm, Ajit Agarwal wrote: > Hello Richard: > > On 17/10/23 2:47 pm, Richard Biener wrote: >> On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal wrote: >>> >>> Hello Richard: >>> >>> On 17/10/23 2:03 pm, Richard Biener wrote:

[PATCH] tree-optimization: Add register pressure heuristics

2023-11-02 Thread Ajit Agarwal
Hello All: Currently code sinking heuristics are based on profile data like basic block count and sink frequency threshold. We have removed such heuristics and added register pressure heuristics based on live-in and live-out of early blocks and immediate dominator of use blocks of the same loop ne

Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Ajit Agarwal
Hello Richard: On 03/11/23 12:51 pm, Richard Biener wrote: > On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal wrote: >> >> Hello All: >> >> Currently code sinking heuristics are based on profile data like >> basic block count and sink frequency threshold. We have rem

Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Ajit Agarwal
Hello Richard: On 03/11/23 7:06 pm, Richard Biener wrote: > On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal wrote: >> >> Hello Richard: >> >> On 03/11/23 12:51 pm, Richard Biener wrote: >>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal wrote: >>>

Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-04 Thread Ajit Agarwal
1235 16.6 * 554.roms_r1268 5.92 * Est. SPECrate(R)2017_fp_base8.00 Thanks & Regards Ajit On 03/11/23 8:24 pm, Ajit Agarwal wrote: > Hello Richard: > > > On 03/11/23 7:06 pm, Richard Biener wrote: >> On Fri, Nov 3,

Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-07 Thread Ajit Agarwal
Hello Segher: On 01/03/24 3:02 am, Segher Boessenkool wrote: > Hi! > > On Mon, Feb 19, 2024 at 04:24:37PM +0530, Ajit Agarwal wrote: >> --- a/gcc/config.gcc >> +++ b/gcc/config.gcc >> @@ -518,7 +518,7 @@ or1k*-*-*) >> ;; >> powerpc*-*-*) >>

[PATCH V12]: Improve code sinking pass

2024-03-13 Thread Ajit Agarwal
Hello All: Currently, code sinking will sink code at the use points with loop having same nesting depth. The following patch improves code sinking by placing the sunk code in immediate dominator with same loop nest depth. Changes since v11: Reorganization of the code. For example : void bar();

[PATCH V3 3/4] ree: Improve ree pass.

2024-03-13 Thread Ajit Agarwal
Hello All: For rs6000 target we see redundant zero and sign extension and done to improve ree pass to eliminate such redundant zero and sign extension. Support of zero_extend/sign_extend/AND. Also support of AND with extension with different constants like 0x7/0x7F/0x7 other than 1. Changes s

[PATCH] tree-ssa-sink: Improve code sinking pass

2024-03-13 Thread Ajit Agarwal
Hello Richard: Currently, code sinking will sink code at the use points with loop having same nesting depth. The following patch improves code sinking by placing the sunk code in begining of the block after the labels. For example : void bar(); int j; void foo(int a, int b, int c, int d, int e,

[PATCH V1 0/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal
Hello All: Common infrastructure using generic code for load store fusion of rs6000 target. This patch is split-patch 0 which uses generic code are implemented and defined that can be used in target specific code for aarch64 and rs6000 target. Generic code are implemeneted in gcc/pair-fusion-bas

[PATCH V1 1/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal
Hello All: Common infrastructure using generic code for load store fusion of rs6000 target. Generic code are implemented and defined that can be used in target specific code for aarch64 and rs6000 target. Generic code are implemeneted in gcc/pair-fusion-base.h, gcc/pair-fusion-common.cc and gc

[PATCH V2 0/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal
Hello All: Common infrastructure using generic code for load store fusion of rs6000 target. This patch is split-patch 0 which uses generic code are implemented and defined that can be used in target specific code for aarch64 and rs6000 target. Generic code are implemeneted in gcc/pair-fusion-b

[PATCH V2 1/1] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-13 Thread Ajit Agarwal
Hello All: Common infrastructure using generic code for load store fusion of rs6000 target. Generic code are implemented and defined that can be used in target specific code for aarch64 and rs6000 target. Generic code are implemeneted in gcc/pair-fusion-base.h, gcc/pair-fusion-common.cc and gc

[PING^0][PATCH V3 0/2] aarch64: Place target independent and dependent changed and unchanged code in one file.

2024-03-18 Thread Ajit Agarwal
Hello Richard/Alex: Ping! Please reply. Thanks & Regards Ajit On 27/02/24 12:33 pm, Ajit Agarwal wrote: > Hello Richard/Alex: > > This patch has better diff with changed and unchanged code. > Unchanged code and some of the changed code will be extracted > into target inde

[PATCH] rs6000: Stackoverflow in optimized code on PPC (PR100799)

2024-03-22 Thread Ajit Agarwal
Hello All: When using FlexiBLAS with OpenBLAS we noticed corruption of the parameters passed to OpenBLAS functions. FlexiBLAS basically provides a BLAS interface where each function is a stub that forwards the arguments to a real BLAS lib, like OpenBLAS. Fixes the corruption of caller frame chec

[PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal
Hello Jakub: When using FlexiBLAS with OpenBLAS we noticed corruption of the parameters passed to OpenBLAS functions. FlexiBLAS basically provides a BLAS interface where each function is a stub that forwards the arguments to a real BLAS lib, like OpenBLAS. Fixes the corruption of caller frame che

Re: [PATCH] rs6000: Stackoverflow in optimized code on PPC (PR100799)

2024-03-22 Thread Ajit Agarwal
Hello Jakub: Addressed the below comments and sent version 1 of the patch for review. Thanks & Regards Ajit On 22/03/24 1:15 pm, Jakub Jelinek wrote: > On Fri, Mar 22, 2024 at 01:00:21PM +0530, Ajit Agarwal wrote: >> When using FlexiBLAS with OpenBLAS we noticed corruption of >

[PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal
Hello All: This is version-2 of the patch with review comments addressed. When using FlexiBLAS with OpenBLAS we noticed corruption of the parameters passed to OpenBLAS functions. FlexiBLAS basically provides a BLAS interface where each function is a stub that forwards the arguments to a real BLAS

Re: [PATCH v1] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-22 Thread Ajit Agarwal
Hello Jakub: Thanks for review. Addressed below review comments and sent version 2 of the patch for review. Thanks & Regards Ajit On 22/03/24 3:06 pm, Jakub Jelinek wrote: > On Fri, Mar 22, 2024 at 02:55:43PM +0530, Ajit Agarwal wrote: >> rs6000: Stackoverflow in optimized code on

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello Peter: On 23/03/24 10:07 am, Peter Bergner wrote: > On 3/22/24 5:15 AM, Ajit Agarwal wrote: >> When using FlexiBLAS with OpenBLAS we noticed corruption of >> the parameters passed to OpenBLAS functions. FlexiBLAS >> basically provides a BLAS interface where each funct

[PATCH v3] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello All: When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller stack frame when calling OpenBLAS functions. This was caused by the FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the number of function parameters in the callee due to hidden Fortran parameters

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
Hello Peter: Sent version-3 of the patch addressing below review comments. Thanks & Regards Ajit On 23/03/24 3:03 pm, Ajit Agarwal wrote: > Hello Peter: > > On 23/03/24 10:07 am, Peter Bergner wrote: >> On 3/22/24 5:15 AM, Ajit Agarwal wrote: >>> When using FlexiB

Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]

2024-03-23 Thread Ajit Agarwal
On 23/03/24 9:33 pm, Peter Bergner wrote: > On 3/23/24 4:33 AM, Ajit Agarwal wrote: >>>> - else if (align_words < GP_ARG_NUM_REG) >>>> + else if (align_words < GP_ARG_NUM_REG >>>> + || (cum->hidden_string_length >>>

[PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-05 Thread Ajit Agarwal
Hello Alex/Richard: All review comments are incorporated. Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface betwwen target independent and depen

Re: [PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-04-05 Thread Ajit Agarwal
Hello Alex: On 03/04/24 8:51 pm, Alex Coplan wrote: > On 23/02/2024 16:41, Ajit Agarwal wrote: >> Hello Richard/Alex/Segher: > > Hi Ajit, > > Sorry for the delay and thanks for working on this. > > Generally this looks like the right sort of approach (IMO) but I&#x

[PATCH] aarch64: Preparatory patch to place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
Hello Alex/Richard: All review comments are addressed. Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface betwwen target independent and dependen

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
On 05/04/24 10:03 pm, Alex Coplan wrote: > On 05/04/2024 13:53, Ajit Agarwal wrote: >> Hello Alex/Richard: >> >> All review comments are incorporated. > > Thanks, I was kind-of expecting you to also send the renaming patch as a > preparatory patch as we discusse

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
Hello Alex: On 09/04/24 7:29 pm, Alex Coplan wrote: > On 09/04/2024 17:30, Ajit Agarwal wrote: >> >> >> On 05/04/24 10:03 pm, Alex Coplan wrote: >>> On 05/04/2024 13:53, Ajit Agarwal wrote: >>>> Hello Alex/Richard: >>>> >>>> A

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-09 Thread Ajit Agarwal
Hello Alex: On 09/04/24 8:39 pm, Alex Coplan wrote: > On 09/04/2024 20:01, Ajit Agarwal wrote: >> Hello Alex: >> >> On 09/04/24 7:29 pm, Alex Coplan wrote: >>> On 09/04/2024 17:30, Ajit Agarwal wrote: >>>> >>>> >>>> On 05/04/24 10

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Ajit Agarwal
Hello Alex: On 10/04/24 1:42 pm, Alex Coplan wrote: > Hi Ajit, > > On 09/04/2024 20:59, Ajit Agarwal wrote: >> Hello Alex: >> >> On 09/04/24 8:39 pm, Alex Coplan wrote: >>> On 09/04/2024 20:01, Ajit Agarwal wrote: >>>> Hello Alex: >>>>

[PATCH v1] aarch64: Preparatory Patch to place target independent and dependent changed code in one file

2024-04-10 Thread Ajit Agarwal
Hello Alex/Richard: All comments are addressed in this version-1 of the patch. Common infrastructure of load store pair fusion is divded into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface betwwen target i

Re: [PATCH V4 1/3] aarch64: Place target independent and dependent changed code in one file

2024-04-10 Thread Ajit Agarwal
Hello Alex: On 10/04/24 7:52 pm, Alex Coplan wrote: > Hi Ajit, > > On 10/04/2024 15:31, Ajit Agarwal wrote: >> Hello Alex: >> >> On 10/04/24 1:42 pm, Alex Coplan wrote: >>> Hi Ajit, >>> >>> On 09/04/2024 20:59, Ajit Agarwal wrote: >>>

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-02-14 Thread Ajit Agarwal
Hello Alex: On 24/01/24 10:13 pm, Alex Coplan wrote: > Hi Ajit, > > On 21/01/2024 19:57, Ajit Agarwal wrote: >> >> Hello All: >> >> New pass to replace adjacent memory addresses lxv with lxvp. >> Added common infrastructure for load store fusion for >>

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Ajit Agarwal
Hello Richard: On 14/02/24 4:03 pm, Richard Sandiford wrote: > Hi, > > Thanks for working on this. > > You posted a version of this patch on Sunday too. If you need to repost > to fix bugs or make other improvements, could you describe the changes > that you've made since the previous version?

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Ajit Agarwal
On 14/02/24 7:22 pm, Ajit Agarwal wrote: > Hello Richard: > > > On 14/02/24 4:03 pm, Richard Sandiford wrote: >> Hi, >> >> Thanks for working on this. >> >> You posted a version of this patch on Sunday too. If you need to repost >> to fix bug

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Ajit Agarwal
Hello Sam: On 14/02/24 10:50 pm, Sam James wrote: > > Ajit Agarwal writes: > >> Hello Richard: >> >> >> On 14/02/24 4:03 pm, Richard Sandiford wrote: >>> Hi, >>> >>> Thanks for working on this. >>> >>> You posted a

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Ajit Agarwal
On 14/02/24 10:56 pm, Richard Sandiford wrote: > Ajit Agarwal writes: >>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>>> index 88ee0dd67fc..a8d0ee7c4db 100644 >>>> --- a/gcc/df-problems.cc >>>> +++ b/gcc/df-problems.cc >>>

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-02-14 Thread Ajit Agarwal
Hello Richard: On 14/02/24 10:45 pm, Richard Sandiford wrote: > Ajit Agarwal writes: >>>> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc >>>> index 1856fa4884f..ffc47a6eaa0 100644 >>>> --- a/gcc/emit-rtl.cc >>>> +++ b/gcc/emit-rtl.cc >>&

Re: [PATCH V1] Common infrastructure for load-store fusion for aarch64 and rs6000 target

2024-02-14 Thread Ajit Agarwal
Hello Richard: On 15/02/24 1:14 am, Richard Sandiford wrote: > Ajit Agarwal writes: >> On 14/02/24 10:56 pm, Richard Sandiford wrote: >>> Ajit Agarwal writes: >>>>>> diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc >>>>>> index 88ee0dd67

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-02-14 Thread Ajit Agarwal
Hello Richard: On 15/02/24 2:21 am, Richard Sandiford wrote: > Ajit Agarwal writes: >> Hello Richard: >> >> >> On 14/02/24 10:45 pm, Richard Sandiford wrote: >>> Ajit Agarwal writes: >>>>>> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc >

[PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal
Hello Richard: As per your suggestion I have divided the patch into target independent and target dependent for aarch64 target. I kept aarch64-ldp-fusion same and did not change that. Common infrastructure of load store pair fusion is divided into target independent and target dependent code for

Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal
Hello Alex: On 15/02/24 10:12 pm, Alex Coplan wrote: > On 15/02/2024 21:24, Ajit Agarwal wrote: >> Hello Richard: >> >> As per your suggestion I have divided the patch into target independent >> and target dependent for aarch64 target. I kept aarch64-ldp-fusion same

Re: [PATCH 0/1 V2] Target independent code for common infrastructure of load,store fusion for rs6000 and aarch64 target.

2024-02-15 Thread Ajit Agarwal
On 15/02/24 10:43 pm, Alex Coplan wrote: > So IIUC Richard was suggesting splitting into target-independent and > target-dependent pieces within aarch64-ldp-fusion.cc as a first step, > i.e. you introduce the abstractions (virtual functions) needed within > that file. That should hopefully be a

[PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.

2024-02-15 Thread Ajit Agarwal
Hello Alex/Richard: I have placed target indpendent and target dependent code in aarch64-ldp-fusion for load store fusion. Common infrastructure of load store pair fusion is divided into target independent and target dependent code. Target independent code is the Generic code with pure virtual f

[PATCH 2/2 V2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-02-15 Thread Ajit Agarwal
Hello All: This patch is for load store fusion for rs6000 target using common infrastructure. Common infrastructure using generic code for load store fusion of rs6000 target. Generic code are implemented and defined that can be used in target specific code for aarch64 and rs6000 target. Gene

[PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-02-19 Thread Ajit Agarwal
Hello All: Changes in V3 since V2 patch. Fdllowing changes are done in this patch. a) Remove commented asserted code in rtl-ssa/changes.cc b) Handle such code in rs6000-vecload-fusion.cc. Same as V2: Common infrastructure using generic code for load store fusion of rs6000 target. Generic code

[PATCH V3 0/2] aarch64: Place target independent and dependent changed code in one file.

2024-02-23 Thread Ajit Agarwal
Hello Richard/Alex/Segher: This patch adds the changed code for target independent and dependent code for load store fusion. Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure

Re: [PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.

2024-02-23 Thread Ajit Agarwal
Hello Richard: On 23/02/24 1:19 am, Richard Sandiford wrote: > Ajit Agarwal writes: >> Hello Alex/Richard: >> >> I have placed target indpendent and target dependent code in >> aarch64-ldp-fusion for load store fusion. >> >> Common infrastructure of

ReRe:[PATCH V3 0/2] aarch64: Place target independent and dependent changed and unchanged code in one file.

2024-02-26 Thread Ajit Agarwal
-fusion Please review. Thanks & Regards Ajit On 23/02/24 4:41 pm, Ajit Agarwal wrote: > Hello Richard/Alex/Segher: > > This patch adds the changed code for target independent and > dependent code for load store fusion. > > Common infrastructure of load store pair fusion is

[PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-14 Thread Ajit Agarwal
Hello All: This patch add the vecload pass to replace adjacent memory accesses lxv with lxvp instructions. This pass is added before ira pass. vecload pass removes one of the defined adjacent lxv (load) and replace with lxvp. Due to removal of one of the defined loads the allocno is has only us

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello All: Following performance gains for spec2017 FP benchmarks. 554.roms_r 16% gains 544.nab_r 9.98% gains 521.wrf_r 6.89% gains. Thanks & Regards Ajit On 14/01/24 8:55 pm, Ajit Agarwal wrote: > Hello All: > > This patch add the vecload pass to replace adjacent memory acce

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello Richard: On 15/01/24 3:03 pm, Richard Biener wrote: > On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal wrote: >> >> Hello All: >> >> This patch add the vecload pass to replace adjacent memory accesses lxv with >> lxvp >> instructions. This pass is ad

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
On 15/01/24 6:14 pm, Ajit Agarwal wrote: > Hello Richard: > > On 15/01/24 3:03 pm, Richard Biener wrote: >> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal wrote: >>> >>> Hello All: >>> >>> This patch add the vecload pass to replace adjacent

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-15 Thread Ajit Agarwal
Hello Richard: On 15/01/24 6:25 pm, Ajit Agarwal wrote: > > > On 15/01/24 6:14 pm, Ajit Agarwal wrote: >> Hello Richard: >> >> On 15/01/24 3:03 pm, Richard Biener wrote: >>> On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal wrote: >>>> >>>

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-17 Thread Ajit Agarwal
Hello Kewen: On 17/01/24 12:32 pm, Kewen.Lin wrote: > on 2024/1/16 06:22, Ajit Agarwal wrote: >> Hello Richard: >> >> On 15/01/24 6:25 pm, Ajit Agarwal wrote: >>> >>> >>> On 15/01/24 6:14 pm, Ajit Agarwal wrote: >>>> Hello Richard: >&

Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp

2024-01-18 Thread Ajit Agarwal
Hello Michael: On 17/01/24 7:58 pm, Michael Matz wrote: > Hello, > > On Wed, 17 Jan 2024, Ajit Agarwal wrote: > >>> first is even, since OOmode is only ok for even vsx register and its >>> size makes it take two consecutive vsx registers. >>> &g

Fwd: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-21 Thread Ajit Agarwal
Hello All: New pass to replace adjacent memory addresses lxv with lxvp. Added common infrastructure for load store fusion for different targets. Common routines are refactored in fusion-common.h. AARCH64 load/store fusion pass is not changed with the common infrastructure. For AARCH64 archit

[PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-21 Thread Ajit Agarwal
Hello All: New pass to replace adjacent memory addresses lxv with lxvp. Added common infrastructure for load store fusion for different targets. Common routines are refactored in fusion-common.h. AARCH64 load/store fusion pass is not changed with the common infrastructure. For AARCH64 archit

Re: [PATCH V2] rs6000: New pass for replacement of adjacent loads fusion (lxv).

2024-01-31 Thread Ajit Agarwal
Hello Alex: Thanks for your valuable review comments. I am incorporating the comments and would send the patch with rs6000 and AARCH64 changes. Thanks & Regards Ajit On 24/01/24 10:13 pm, Alex Coplan wrote: > Hi Ajit, > > On 21/01/2024 19:57, Ajit Agarwal wrote: >> >

[PATCH] rs6000: New pass for replacement of adjacent lxv with lxvp.

2024-01-09 Thread Ajit Agarwal
Hello All: This pass is registered before ira rtl pass. Bootstrapped and regtested for powerpc64-linux-gnu. No regressions for spec 2017 benchmarks and improvements for some of the FP and INT benchmarks. Vladimir: I did modify IRA and LRA register Allocators. Please review. Thanks & Regards Aj

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-01 Thread Ajit Agarwal
Hello Kewen: On 24/11/23 3:01 pm, Kewen.Lin wrote: > Hi Ajit, > > Don't forget to CC David (CC-ed) :), some comments are inlined below. > > on 2023/10/8 03:04, Ajit Agarwal wrote: >> Hello All: >> >> This patch add new pass to replace contiguous

Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-01 Thread Ajit Agarwal
On 28/11/23 3:14 pm, Kewen.Lin wrote: > on 2023/11/28 15:05, Michael Meissner wrote: >> I tried using this patch to compare with the vector size attribute patch I >> posted. I could not build it as a cross compiler on my x86_64 because the >> assembler gives the following error: >> >> Error: op

  1   2   3   4   >