[gcc r15-1619] ira: Scale save/restore costs of callee save registers with block frequency

2024-06-25 Thread Surya Kumari Jangala via Gcc-cvs
https://gcc.gnu.org/g:3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b

commit r15-1619-g3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala 
Date:   Tue Jun 25 08:37:49 2024 -0500

ira: Scale save/restore costs of callee save registers with block frequency

In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.

2024-06-25  Surya Kumari Jangala  

gcc/
PR rtl-optimization/111673
* ira-color.cc (assign_hard_reg): Scale save/restore costs of
callee save registers with block frequency.

gcc/testsuite/
PR rtl-optimization/111673
* gcc.target/powerpc/pr111673.c: New test.

Diff:
---
 gcc/ira-color.cc|  4 +++-
 gcc/testsuite/gcc.target/powerpc/pr111673.c | 17 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..ca32a23a0c9 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2178,7 +2178,9 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
add_cost = ((ira_memory_move_cost[mode][rclass][0]
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
- mode) - 1);
+ mode) - 1)
+  * (optimize_size ? 1 :
+ REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c 
b/gcc/testsuite/gcc.target/powerpc/pr111673.c
new file mode 100644
index 000..e0c0f85460a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+
+int f (int);
+int
+advance (int dz)
+{
+  if (dz > 0)
+return (dz + dz) * dz;
+  else
+return dz * f (dz);
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


[gcc r15-2034] aarch64: Fix the expected output of the test cpy_1.c [PR115892]

2024-07-14 Thread Surya Kumari Jangala via Gcc-cvs
https://gcc.gnu.org/g:8b1492012e5a11e9400e30ee4ae9195c08a2a81e

commit r15-2034-g8b1492012e5a11e9400e30ee4ae9195c08a2a81e
Author: Surya Kumari Jangala 
Date:   Thu Jul 11 11:02:17 2024 -0500

aarch64: Fix the expected output of the test cpy_1.c [PR115892]

The fix at r15-1619-g3b9b8d6cfdf593 results in a rearrangement of
instructions generated for cpy_1.c. This patch fixes the expected output.

2024-07-12  Surya Kumari Jangala  

gcc/testsuite:
PR testsuite/115892
* gcc.target/aarch64/sve/acle/general/cpy_1.c: Update expected
output.

Diff:
---
 gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c
index 57b56a7e256f..1d669913df2e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c
@@ -11,9 +11,15 @@ extern "C" {
 /*
 ** dup_x0_m:
 ** ...
+** (
 ** add (x[0-9]+), x0, #?1
 ** mov (p[0-7])\.b, p15\.b
 ** mov z0\.d, \2/m, \1
+** |
+** mov (p[0-7])\.b, p15\.b
+** add (x[0-9]+), x0, #?1
+** mov z0\.d, \3/m, \4
+** )
 ** ...
 ** ret
 */


[gcc r15-2036] arm: Fix the expected output of the test pr111235.c [PR115894]

2024-07-14 Thread Surya Kumari Jangala via Gcc-cvs
https://gcc.gnu.org/g:60ba989220d9dec07d82009b0dafe684e652577f

commit r15-2036-g60ba989220d9dec07d82009b0dafe684e652577f
Author: Surya Kumari Jangala 
Date:   Mon Jul 15 00:03:06 2024 -0500

arm: Fix the expected output of the test pr111235.c  [PR115894]

With r15-1619-g3b9b8d6cfdf593, pr111235.c fails due to different
registers used in ldrexd instruction. The key part of this test is that
the compiler generates LDREXD. The registers used for that are pretty
much irrelevant as they are not matched with any other operations within
the test. This patch changes the test to test only for the mnemonic and
not for any of the operands.

2024-07-15  Surya Kumari Jangala  

gcc/testsuite:
PR testsuite/115894
* gcc.target/arm/pr111235.c: Update expected output.

Diff:
---
 gcc/testsuite/gcc.target/arm/pr111235.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c 
b/gcc/testsuite/gcc.target/arm/pr111235.c
index b06a5bfb8e29..1f732cab983a 100644
--- a/gcc/testsuite/gcc.target/arm/pr111235.c
+++ b/gcc/testsuite/gcc.target/arm/pr111235.c
@@ -31,7 +31,7 @@ void t3 (long long *p, int x)
 atomic_store_explicit (p, x, memory_order_relaxed);
 }
 
-/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, 
\\\[r\[0-9\]+\\\]" 2 } } */
+/* { dg-final { scan-assembler-times "ldrexd\t" 2 } } */
 /* { dg-final { scan-assembler-not "ldrgt" } } */
 /* { dg-final { scan-assembler-not "ldrdgt" } } */
 /* { dg-final { scan-assembler-not "ldrexdgt" } } */


[gcc r15-2810] lra: emit caller-save register spills before call insn [PR116028]

2024-08-08 Thread Surya Kumari Jangala via Gcc-cvs
https://gcc.gnu.org/g:3c67a0fa1dd39a3378deb854a7fef0ff7fe38004

commit r15-2810-g3c67a0fa1dd39a3378deb854a7fef0ff7fe38004
Author: Surya Kumari Jangala 
Date:   Thu Dec 7 22:42:43 2023 -0600

lra: emit caller-save register spills before call insn [PR116028]

LRA emits insns to save caller-save registers in the
inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
Basic Block) and traverses the insns in the EBBs in reverse order from
the last insn to the first insn. When LRA sees a write to a pseudo (that
has been assigned a caller-save register), and there is a read following
the write, with an intervening call insn between the write and read,
then LRA generates a spill immediately after the write and a restore
immediately before the read. The spill is needed because the call insn
will clobber the caller-save register.

If there is a write insn and a call insn in two separate BBs but
belonging to the same EBB, the spill insn gets generated in the BB
containing the write insn. If the write insn is in the entry BB, then
the spill insn that is generated in the entry BB prevents shrink wrap
from happening. This is because the spill insn references the stack
pointer and hence the prolog gets generated in the entry BB itself.

This patch ensures the the spill insn is generated before the call insn
instead of after the write. This also ensures that the spill occurs
only in the path containing the call.

2024-08-01  Surya Kumari Jangala  

gcc:
PR rtl-optimization/116028
* lra-constraints.cc (split_reg): Spill register before call
insn.
(latest_call_insn): New variable.
(inherit_in_ebb): Track the latest call insn.

gcc/testsuite:
PR rtl-optimization/116028
* gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
* gcc.dg/pr10474.c: Remove xfail for powerpc.

Diff:
---
 gcc/lra-constraints.cc   | 28 
 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c |  2 +-
 gcc/testsuite/gcc.dg/pr10474.c   |  2 +-
 3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 92b343fa99a0..28c1a877c003 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -152,6 +152,9 @@ static machine_mode curr_operand_mode[MAX_RECOG_OPERANDS];
(e.g. constant) and whose subreg is given operand of the current
insn.  VOIDmode in all other cases.  */
 static machine_mode original_subreg_reg_mode[MAX_RECOG_OPERANDS];
+/* The nearest call insn for an insn on which split transformation
+   will be done. The call insn is in the same EBB as the insn.  */
+static rtx_insn *latest_call_insn;
 
 
 
@@ -6286,10 +6289,25 @@ split_reg (bool before_p, int original_regno, rtx_insn 
*insn,
 after_p ? restore : NULL,
 call_save_p
 ?  "Add reg<-save" : "Add reg<-split");
-  lra_process_new_insns (insn, before_p ? save : NULL,
-before_p ? NULL : save,
-call_save_p
-?  "Add save<-reg" : "Add split<-reg");
+  if (call_save_p && latest_call_insn != NULL)
+/* PR116028: If original_regno is a pseudo that has been assigned a
+   call-save hard register, then emit the spill insn before the call
+   insn 'latest_call_insn' instead of adjacent to 'insn'. If 'insn'
+   and 'latest_call_insn' belong to the same EBB but to two separate
+   BBs, and if 'insn' is present in the entry BB, then generating the
+   spill insn in the entry BB can prevent shrink wrap from happening.
+   This is because the spill insn references the stack pointer and
+   hence the prolog gets generated in the entry BB itself. It is
+   also more efficient to generate the spill before
+   'latest_call_insn' as the spill now occurs only in the path
+   containing the call.  */
+lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save,
+  "Add save<-reg");
+  else
+lra_process_new_insns (insn, before_p ? save : NULL,
+  before_p ? NULL : save,
+  call_save_p
+  ?  "Add save<-reg" : "Add split<-reg");
   if (nregs > 1 || original_regno < FIRST_PSEUDO_REGISTER)
 /* If we are trying to split multi-register.  We should check
conflicts on the next assignment sub-pass.  IRA can allocate on
@@ -6773,6 +6791,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail)
   last_processed_bb = NULL;
   CLEAR_HARD_REG_SET (potential_reload_hard_regs);
   live_hard_regs = eliminable_regset | lra_no_alloc_regs;
+  latest_call_insn = NULL;
   /* We don't process new insns generated in the loop. */
   for (curr_insn = tail; curr_insn != PREV_INSN (head); curr_insn = prev