[gcc r15-1619] ira: Scale save/restore costs of callee save registers with block frequency
https://gcc.gnu.org/g:3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b commit r15-1619-g3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b Author: Surya Kumari Jangala Date: Tue Jun 25 08:37:49 2024 -0500 ira: Scale save/restore costs of callee save registers with block frequency In assign_hard_reg(), when computing the costs of the hard registers, the cost of saving/restoring a callee-save hard register in prolog/epilog is taken into consideration. However, this cost is not scaled with the entry block frequency. Without scaling, the cost of saving/restoring is quite small and this can result in a callee-save register being chosen by assign_hard_reg() even though there are free caller-save registers available. Assigning a callee save register to a pseudo that is live in the entire function and across a call will cause shrink wrap to fail. 2024-06-25 Surya Kumari Jangala gcc/ PR rtl-optimization/111673 * ira-color.cc (assign_hard_reg): Scale save/restore costs of callee save registers with block frequency. gcc/testsuite/ PR rtl-optimization/111673 * gcc.target/powerpc/pr111673.c: New test. Diff: --- gcc/ira-color.cc| 4 +++- gcc/testsuite/gcc.target/powerpc/pr111673.c | 17 + 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc index b9ae32d1b4d..ca32a23a0c9 100644 --- a/gcc/ira-color.cc +++ b/gcc/ira-color.cc @@ -2178,7 +2178,9 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) add_cost = ((ira_memory_move_cost[mode][rclass][0] + ira_memory_move_cost[mode][rclass][1]) * saved_nregs / hard_regno_nregs (hard_regno, - mode) - 1); + mode) - 1) + * (optimize_size ? 1 : + REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); cost += add_cost; full_cost += add_cost; } diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c b/gcc/testsuite/gcc.target/powerpc/pr111673.c new file mode 100644 index 000..e0c0f85460a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */ + +/* Verify there is an early return without the prolog and shrink-wrap + the function. */ + +int f (int); +int +advance (int dz) +{ + if (dz > 0) +return (dz + dz) * dz; + else +return dz * f (dz); +} + +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */
[gcc r15-2034] aarch64: Fix the expected output of the test cpy_1.c [PR115892]
https://gcc.gnu.org/g:8b1492012e5a11e9400e30ee4ae9195c08a2a81e commit r15-2034-g8b1492012e5a11e9400e30ee4ae9195c08a2a81e Author: Surya Kumari Jangala Date: Thu Jul 11 11:02:17 2024 -0500 aarch64: Fix the expected output of the test cpy_1.c [PR115892] The fix at r15-1619-g3b9b8d6cfdf593 results in a rearrangement of instructions generated for cpy_1.c. This patch fixes the expected output. 2024-07-12 Surya Kumari Jangala gcc/testsuite: PR testsuite/115892 * gcc.target/aarch64/sve/acle/general/cpy_1.c: Update expected output. Diff: --- gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c index 57b56a7e256f..1d669913df2e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/cpy_1.c @@ -11,9 +11,15 @@ extern "C" { /* ** dup_x0_m: ** ... +** ( ** add (x[0-9]+), x0, #?1 ** mov (p[0-7])\.b, p15\.b ** mov z0\.d, \2/m, \1 +** | +** mov (p[0-7])\.b, p15\.b +** add (x[0-9]+), x0, #?1 +** mov z0\.d, \3/m, \4 +** ) ** ... ** ret */
[gcc r15-2036] arm: Fix the expected output of the test pr111235.c [PR115894]
https://gcc.gnu.org/g:60ba989220d9dec07d82009b0dafe684e652577f commit r15-2036-g60ba989220d9dec07d82009b0dafe684e652577f Author: Surya Kumari Jangala Date: Mon Jul 15 00:03:06 2024 -0500 arm: Fix the expected output of the test pr111235.c [PR115894] With r15-1619-g3b9b8d6cfdf593, pr111235.c fails due to different registers used in ldrexd instruction. The key part of this test is that the compiler generates LDREXD. The registers used for that are pretty much irrelevant as they are not matched with any other operations within the test. This patch changes the test to test only for the mnemonic and not for any of the operands. 2024-07-15 Surya Kumari Jangala gcc/testsuite: PR testsuite/115894 * gcc.target/arm/pr111235.c: Update expected output. Diff: --- gcc/testsuite/gcc.target/arm/pr111235.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/arm/pr111235.c b/gcc/testsuite/gcc.target/arm/pr111235.c index b06a5bfb8e29..1f732cab983a 100644 --- a/gcc/testsuite/gcc.target/arm/pr111235.c +++ b/gcc/testsuite/gcc.target/arm/pr111235.c @@ -31,7 +31,7 @@ void t3 (long long *p, int x) atomic_store_explicit (p, x, memory_order_relaxed); } -/* { dg-final { scan-assembler-times "ldrexd\tr\[0-9\]+, r\[0-9\]+, \\\[r\[0-9\]+\\\]" 2 } } */ +/* { dg-final { scan-assembler-times "ldrexd\t" 2 } } */ /* { dg-final { scan-assembler-not "ldrgt" } } */ /* { dg-final { scan-assembler-not "ldrdgt" } } */ /* { dg-final { scan-assembler-not "ldrexdgt" } } */
[gcc r15-2810] lra: emit caller-save register spills before call insn [PR116028]
https://gcc.gnu.org/g:3c67a0fa1dd39a3378deb854a7fef0ff7fe38004 commit r15-2810-g3c67a0fa1dd39a3378deb854a7fef0ff7fe38004 Author: Surya Kumari Jangala Date: Thu Dec 7 22:42:43 2023 -0600 lra: emit caller-save register spills before call insn [PR116028] LRA emits insns to save caller-save registers in the inheritance/splitting pass. In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in the EBBs in reverse order from the last insn to the first insn. When LRA sees a write to a pseudo (that has been assigned a caller-save register), and there is a read following the write, with an intervening call insn between the write and read, then LRA generates a spill immediately after the write and a restore immediately before the read. The spill is needed because the call insn will clobber the caller-save register. If there is a write insn and a call insn in two separate BBs but belonging to the same EBB, the spill insn gets generated in the BB containing the write insn. If the write insn is in the entry BB, then the spill insn that is generated in the entry BB prevents shrink wrap from happening. This is because the spill insn references the stack pointer and hence the prolog gets generated in the entry BB itself. This patch ensures the the spill insn is generated before the call insn instead of after the write. This also ensures that the spill occurs only in the path containing the call. 2024-08-01 Surya Kumari Jangala gcc: PR rtl-optimization/116028 * lra-constraints.cc (split_reg): Spill register before call insn. (latest_call_insn): New variable. (inherit_in_ebb): Track the latest call insn. gcc/testsuite: PR rtl-optimization/116028 * gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc. * gcc.dg/pr10474.c: Remove xfail for powerpc. Diff: --- gcc/lra-constraints.cc | 28 gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c | 2 +- gcc/testsuite/gcc.dg/pr10474.c | 2 +- 3 files changed, 26 insertions(+), 6 deletions(-) diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc index 92b343fa99a0..28c1a877c003 100644 --- a/gcc/lra-constraints.cc +++ b/gcc/lra-constraints.cc @@ -152,6 +152,9 @@ static machine_mode curr_operand_mode[MAX_RECOG_OPERANDS]; (e.g. constant) and whose subreg is given operand of the current insn. VOIDmode in all other cases. */ static machine_mode original_subreg_reg_mode[MAX_RECOG_OPERANDS]; +/* The nearest call insn for an insn on which split transformation + will be done. The call insn is in the same EBB as the insn. */ +static rtx_insn *latest_call_insn; @@ -6286,10 +6289,25 @@ split_reg (bool before_p, int original_regno, rtx_insn *insn, after_p ? restore : NULL, call_save_p ? "Add reg<-save" : "Add reg<-split"); - lra_process_new_insns (insn, before_p ? save : NULL, -before_p ? NULL : save, -call_save_p -? "Add save<-reg" : "Add split<-reg"); + if (call_save_p && latest_call_insn != NULL) +/* PR116028: If original_regno is a pseudo that has been assigned a + call-save hard register, then emit the spill insn before the call + insn 'latest_call_insn' instead of adjacent to 'insn'. If 'insn' + and 'latest_call_insn' belong to the same EBB but to two separate + BBs, and if 'insn' is present in the entry BB, then generating the + spill insn in the entry BB can prevent shrink wrap from happening. + This is because the spill insn references the stack pointer and + hence the prolog gets generated in the entry BB itself. It is + also more efficient to generate the spill before + 'latest_call_insn' as the spill now occurs only in the path + containing the call. */ +lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save, + "Add save<-reg"); + else +lra_process_new_insns (insn, before_p ? save : NULL, + before_p ? NULL : save, + call_save_p + ? "Add save<-reg" : "Add split<-reg"); if (nregs > 1 || original_regno < FIRST_PSEUDO_REGISTER) /* If we are trying to split multi-register. We should check conflicts on the next assignment sub-pass. IRA can allocate on @@ -6773,6 +6791,7 @@ inherit_in_ebb (rtx_insn *head, rtx_insn *tail) last_processed_bb = NULL; CLEAR_HARD_REG_SET (potential_reload_hard_regs); live_hard_regs = eliminable_regset | lra_no_alloc_regs; + latest_call_insn = NULL; /* We don't process new insns generated in the loop. */ for (curr_insn = tail; curr_insn != PREV_INSN (head); curr_insn = prev