On Tue, Mar 4, 2025 at 6:31 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford > <richard.sandif...@arm.com> wrote: > > > > Richard Sandiford <richard.sandif...@arm.com> writes: > > > Jan Hubicka <hubi...@ucw.cz> writes: > > >>> > > >>> Thanks for running these. I saw poor results for perlbench with my > > >>> initial aarch64 hooks because the hooks reduced the cost to zero for > > >>> the entry case: > > >>> > > >>> auto entry_cost = targetm.callee_save_cost > > >>> (spill_cost_type::SAVE, hard_regno, mode, saved_nregs, > > >>> ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs, > > >>> allocated_callee_save_regs, existing_spills_p); > > >>> /* In the event of a tie between caller-save and callee-save, > > >>> prefer callee-save. We apply this to the entry cost rather > > >>> than the exit cost since the entry frequency must be at > > >>> least as high as the exit frequency. */ > > >>> if (entry_cost > 0) > > >>> entry_cost -= 1; > > >>> > > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was > > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > > > >>> 1", > > >>> so that the cost is always greater than not using a callee save for > > >>> registers that don't cross a call. WDYT? > > >> > > >> For x86 perfomance costs, the push cost should be memory_move_cost which > > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost > > >> should not be 0 I think. > > >> > > >> For size cost, I currently return 1, so we indeed get 0 after > > >> adjustment. > > >> > > >> I think cost of 0 will make us to pick callee save even if caller save > > >> is available and there are no function calls, so I guess we do not want > > >> that.... > > > > > > OK, here's an updated patch that makes that change. The x86 parts > > > should be replaced by your patch. > > > > > > Tested on aarch64-linux-gnu. I also tried to test on > > > pwoerpc64el-linux-gnu > > > (on gcc112), but I keep getting broken pipes during the test runs, > > > so I'm struggling to get good before/after comparisons. It does at > > > least bootstrap though... > > > > Here's the patch with Honza's x86 changes. Boostrapped & regresiion-tested > > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120). The powerpc64le > > results regressed: > > > > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 i == 5 > > FAIL: gcc.dg/guality/vla-1.c -Os -DPREVENT_OPTIMIZATION line 24 sizeof > > (a) == 17 * sizeof (short) > > > > but the same test already failed for -O2 and -O3. > > > > OK to install now? Or, given the lateness in the release cycle, > > would it be better to wait for GCC 16? > > I think it's OK to install now. Not installing anything isn't an option, the > alternative would be to at least revert HJs change. I'm hoping to install this patch in GCC15. > > Thanks, > Richard. > > > > > Thanks, > > Richard > > > > > > Following on from the discussion in: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html > > > > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and > > replaces it with two hooks: one that controls the cost of using an > > extra callee-saved register and one that controls the cost of allocating > > a frame for the first spill. > > > > (The patch does not attempt to address the shrink-wrapping part of > > the thread above.) > > > > On AArch64, this is enough to fix PR117477, as verified by the new tests. > > The patch does not change the SPEC2017 scores significantly. (I saw a > > slight improvement in fotonik3d and roms, but I'm not convinced that > > the improvements are real.) > > > > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c, > > which is a scan-dump correctness test that relies on not using > > caller saves. The decision to use caller saves looks appropriate, > > and saves an instruction, so I've just added -fno-caller-saves > > to the test options. > > > > The x86 parts were written by Honza. > > > > gcc/ > > PR rtl-optimization/117477 > > * config/aarch64/aarch64.cc (aarch64_count_saves): New function. > > (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost) > > (aarch64_frame_allocation_cost): Likewise. > > (TARGET_CALLEE_SAVE_COST): Define. > > (TARGET_FRAME_ALLOCATION_COST): Likewise. > > * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): > > Replace with... > > (ix86_callee_save_cost): ...this new hook. > > (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete. > > (TARGET_CALLEE_SAVE_COST): Define. > > * target.h (spill_cost_type, frame_cost_type): New enums. > > * target.def (callee_save_cost, frame_allocation_cost): New hooks. > > (ira_callee_saved_register_cost_scale): Delete. > > * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): > > Delete. > > (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks. > > * doc/tm.texi: Regenerate. > > * hard-reg-set.h (hard_reg_set_popcount): New function. > > * ira-color.cc (allocated_memory_p): New variable. > > (allocated_callee_save_regs): Likewise. > > (record_allocation): New function. > > (assign_hard_reg): Use targetm.frame_allocation_cost to model > > the cost of the first spill or first caller save. Use > > targetm.callee_save_cost to model the cost of using new callee-saved > > registers. Apply the exit rather than entry frequency to the cost > > of restoring a register or deallocating the frame. Update the > > new variables above. > > (improve_allocation): Use record_allocation. > > (color): Initialize allocated_callee_save_regs. > > (ira_color): Initialize allocated_memory_p. > > * targhooks.h (default_callee_save_cost): Declare. > > (default_frame_allocation_cost): Likewise. > > * targhooks.cc (default_callee_save_cost): New function. > > (default_frame_allocation_cost): Likewise. > > > > gcc/testsuite/ > > PR rtl-optimization/117477 > > * gcc.target/aarch64/callee_save_1.c: New test. > > * gcc.target/aarch64/callee_save_2.c: Likewise. > > * gcc.target/aarch64/callee_save_3.c: Likewise. > > * gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves. > > > > Co-authored-by: Jan Hubicka <hubi...@ucw.cz> > > --- > > gcc/config/aarch64/aarch64.cc | 118 ++++++++++++++++++ > > gcc/config/i386/i386.cc | 28 +++-- > > gcc/doc/tm.texi | 77 ++++++++++-- > > gcc/doc/tm.texi.in | 6 +- > > gcc/hard-reg-set.h | 15 +++ > > gcc/ira-color.cc | 83 ++++++++++-- > > gcc/target.def | 87 +++++++++++-- > > gcc/target.h | 12 ++ > > gcc/targhooks.cc | 27 ++++ > > gcc/targhooks.h | 5 + > > .../gcc.target/aarch64/callee_save_1.c | 12 ++ > > .../gcc.target/aarch64/callee_save_2.c | 14 +++ > > .../gcc.target/aarch64/callee_save_3.c | 12 ++ > > gcc/testsuite/gcc.target/aarch64/pr103350-1.c | 2 +- > > 14 files changed, 459 insertions(+), 39 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_1.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_2.c > > create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_3.c > > > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index fe76730b0a7..27ea82cd7da 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -15873,6 +15873,118 @@ aarch64_memory_move_cost (machine_mode mode, > > reg_class_t rclass_i, bool in) > > : base + aarch64_tune_params.memmov_cost.store_int); > > } > > > > +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the > > + RA has already decided to use. Return the total number of registers > > + in class RCLASS that need to be saved and restored, including the > > + frame link registers. */ > > +static int > > +aarch64_count_saves (const HARD_REG_SET &callee_saved_regs, reg_class > > rclass) > > +{ > > + auto saved_gprs = callee_saved_regs & reg_class_contents[rclass]; > > + auto nregs = hard_reg_set_popcount (saved_gprs); > > + > > + if (TEST_HARD_REG_BIT (reg_class_contents[rclass], LR_REGNUM)) > > + { > > + if (aarch64_needs_frame_chain ()) > > + nregs += 2; > > + else if (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)) > > + nregs += 1; > > + } > > + return nregs; > > +} > > + > > +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the > > + RA has already decided to use. Return the total number of registers > > + that need to be saved above the hard frame pointer, including the > > + frame link registers. */ > > +static int > > +aarch64_count_above_hard_fp_saves (const HARD_REG_SET &callee_saved_regs) > > +{ > > + /* FP and Advanced SIMD registers are saved above the frame pointer > > + but SVE registers are saved below it. */ > > + if (known_le (GET_MODE_SIZE (aarch64_reg_save_mode (V8_REGNUM)), 16U)) > > + return aarch64_count_saves (callee_saved_regs, POINTER_AND_FP_REGS); > > + return aarch64_count_saves (callee_saved_regs, POINTER_REGS); > > +} > > + > > +/* Implement TARGET_CALLEE_SAVE_COST. */ > > +static int > > +aarch64_callee_save_cost (spill_cost_type spill_type, unsigned int regno, > > + machine_mode mode, unsigned int nregs, int > > mem_cost, > > + const HARD_REG_SET &callee_saved_regs, > > + bool existing_spill_p) > > +{ > > + /* If we've already committed to saving an odd number of GPRs, assume > > that > > + saving one more will involve turning an STR into an STP and an LDR > > + into an LDP. This should still be more expensive than not spilling > > + (meaning that the minimum cost is 1), but it should usually be cheaper > > + than a separate store or load. */ > > + if (GP_REGNUM_P (regno) > > + && nregs == 1 > > + && (aarch64_count_saves (callee_saved_regs, GENERAL_REGS) & 1)) > > + return 1; > > + > > + /* Similarly for saving FP registers, if we only need to save the low > > + 64 bits. (We can also use STP/LDP instead of STR/LDR for Q registers, > > + but that is less likely to be a saving.) */ > > + if (FP_REGNUM_P (regno) > > + && nregs == 1 > > + && known_eq (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), 8U) > > + && (aarch64_count_saves (callee_saved_regs, FP_REGS) & 1)) > > + return 1; > > + > > + /* If this would be the first register that we save, add the cost of > > + allocating or deallocating the frame. For GPR, FPR, and Advanced SIMD > > + saves, the allocation and deallocation can be folded into the save and > > + restore. */ > > + if (!existing_spill_p > > + && !GP_REGNUM_P (regno) > > + && !(FP_REGNUM_P (regno) > > + && known_le (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), > > 16U))) > > + return default_callee_save_cost (spill_type, regno, mode, nregs, > > mem_cost, > > + callee_saved_regs, existing_spill_p); > > + > > + return mem_cost; > > +} > > + > > +/* Implement TARGET_FRAME_ALLOCATION_COST. */ > > +static int > > +aarch64_frame_allocation_cost (frame_cost_type, > > + const HARD_REG_SET &callee_saved_regs) > > +{ > > + /* The intention is to model the relative costs of different approaches > > + to storing data on the stack, rather than to model the cost of saving > > + data vs not saving it. This means that we should return 0 if: > > + > > + - any frame is going to be allocated with: > > + > > + stp x29, x30, [sp, #-...]! > > + > > + to create a frame link. > > + > > + - any frame is going to be allocated with: > > + > > + str x30, [sp, #-...]! > > + > > + to save the link register. > > + > > + In both cases, the allocation and deallocation instructions are the > > + same however we store data to the stack. (In the second case, the STR > > + could be converted to an STP by saving an extra call-preserved > > register, > > + but that is modeled by aarch64_callee_save_cost.) > > + > > + In other cases, assume that a frame would need to be allocated with a > > + separate subtraction and deallocated with a separate addition. Saves > > + of call-clobbered registers can then reclaim this cost using a > > + predecrement store and a postincrement load. > > + > > + For simplicity, give this addition or subtraction the same cost as > > + a GPR move. We could parameterize this if necessary. */ > > + if (aarch64_count_above_hard_fp_saves (callee_saved_regs) == 0) > > + return aarch64_tune_params.regmove_cost->GP2GP; > > + return 0; > > +} > > + > > /* Implement TARGET_INSN_COST. We have the opportunity to do something > > much more productive here, such as using insn attributes to cost things. > > But we don't, not yet. > > @@ -31557,6 +31669,12 @@ aarch64_libgcc_floating_mode_supported_p > > #undef TARGET_MEMORY_MOVE_COST > > #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost > > > > +#undef TARGET_CALLEE_SAVE_COST > > +#define TARGET_CALLEE_SAVE_COST aarch64_callee_save_cost > > + > > +#undef TARGET_FRAME_ALLOCATION_COST > > +#define TARGET_FRAME_ALLOCATION_COST aarch64_frame_allocation_cost > > + > > #undef TARGET_MIN_DIVISIONS_FOR_RECIP_MUL > > #define TARGET_MIN_DIVISIONS_FOR_RECIP_MUL > > aarch64_min_divisions_for_recip_mul > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > index fb93a6fdd0a..661e71b032c 100644 > > --- a/gcc/config/i386/i386.cc > > +++ b/gcc/config/i386/i386.cc > > @@ -20600,12 +20600,27 @@ ix86_class_likely_spilled_p (reg_class_t rclass) > > return false; > > } > > > > -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE. */ > > +/* Implement TARGET_CALLEE_SAVE_COST. */ > > > > static int > > -ix86_ira_callee_saved_register_cost_scale (int) > > -{ > > - return 1; > > +ix86_callee_save_cost (spill_cost_type, unsigned int hard_regno, > > machine_mode, > > + unsigned int, int mem_cost, const HARD_REG_SET &, > > bool) > > +{ > > + /* Account for the fact that push and pop are shorter and do their > > + own allocation and deallocation. */ > > + if (GENERAL_REGNO_P (hard_regno)) > > + { > > + /* push is 1 byte while typical spill is 4-5 bytes. > > + ??? We probably should adjust size costs accordingly. > > + Costs are relative to reg-reg move that has 2 bytes for 32bit > > + and 3 bytes otherwise. */ > > + if (optimize_function_for_size_p (cfun)) > > + return 1; > > + /* Be sure that no cost table sets cost to 2, so we end up with 0. > > */ > > + gcc_checking_assert (mem_cost > 2); > > + return mem_cost - 2; > > + } > > + return mem_cost; > > } > > > > /* Return true if a set of DST by the expression SRC should be allowed. > > @@ -27092,9 +27107,8 @@ ix86_libgcc_floating_mode_supported_p > > #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true > > #undef TARGET_CLASS_LIKELY_SPILLED_P > > #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p > > -#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > -#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \ > > - ix86_ira_callee_saved_register_cost_scale > > +#undef TARGET_CALLEE_SAVE_COST > > +#define TARGET_CALLEE_SAVE_COST ix86_callee_save_cost > > > > #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST > > #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > index 9f42913a4ef..a96700c0d38 100644 > > --- a/gcc/doc/tm.texi > > +++ b/gcc/doc/tm.texi > > @@ -3047,14 +3047,6 @@ A target hook which can change allocno class for > > given pseudo from > > The default version of this target hook always returns given class. > > @end deftypefn > > > > -@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > (int @var{hard_regno}) > > -A target hook which returns the callee-saved register @var{hard_regno} > > -cost scale in epilogue and prologue used by IRA. > > - > > -The default version of this target hook returns 1 if optimizing for > > -size, otherwise returns the entry block frequency. > > -@end deftypefn > > - > > @deftypefn {Target Hook} bool TARGET_LRA_P (void) > > A target hook which returns true if we use LRA instead of reload pass. > > > > @@ -7011,6 +7003,75 @@ value to the result of that function. The arguments > > to that function > > are the same as to this target hook. > > @end deftypefn > > > > +@deftypefn {Target Hook} int TARGET_CALLEE_SAVE_COST (spill_cost_type > > @var{cost_type}, unsigned int @var{hard_regno}, machine_mode @var{mode}, > > unsigned int @var{nregs}, int @var{mem_cost}, const HARD_REG_SET > > @var{&allocated_callee_regs}, bool @var{existing_spills_p}) > > +Return the one-off cost of saving or restoring callee-saved registers > > +(also known as call-preserved registers or non-volatile registers). > > +The parameters are as follows: > > + > > +@itemize > > +@item > > +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register > > +and @samp{spill_cost_type::RESTORE} for restoring a register. > > + > > +@item > > +@var{hard_regno} and @var{mode} represent the whole register that > > +the register allocator is considering using; of these, > > +@var{nregs} registers are fully or partially callee-saved. > > + > > +@item > > +@var{mem_cost} is the normal cost for storing (for saves) > > +or loading (for restores) the @var{nregs} registers. > > + > > +@item > > +@var{allocated_callee_regs} is the set of callee-saved registers > > +that are already in use. > > + > > +@item > > +@var{existing_spills_p} is true if the register allocator has > > +already decided to spill registers to memory. > > +@end itemize > > + > > +If @var{existing_spills_p} is false, the cost of a save should account > > +for frame allocations in a way that is consistent with > > +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for spills. > > +Similarly, the cost of a restore should then account for frame > > deallocations > > +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s > > +handling of deallocations. > > + > > +Note that this hook should not attempt to apply a frequency scale > > +to the cost: it is the caller's responsibility to do that where > > +appropriate. > > + > > +The default implementation returns @var{mem_cost}, plus the allocation > > +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST}, > > +where appropriate. > > +@end deftypefn > > + > > +@deftypefn {Target Hook} int TARGET_FRAME_ALLOCATION_COST (frame_cost_type > > @var{cost_type}, const HARD_REG_SET @var{&allocated_callee_regs}) > > +Return the cost of allocating or deallocating a frame for the sake of > > +a spill; @var{cost_type} chooses between allocation and deallocation. > > +The term ``spill'' here includes both forcing a pseudo register to memory > > +and using caller-saved registers for pseudo registers that are live across > > +a call. > > + > > +This hook is only called if the register allocator has not so far > > +decided to spill. The allocator may have decided to use callee-saved > > +registers; if so, @var{allocated_callee_regs} is the set of callee-saved > > +registers that the allocator has used. There might also be other reasons > > +why a stack frame is already needed; for example, @samp{get_frame_size ()} > > +might be nonzero, or the target might already require a frame for > > +target-specific reasons. > > + > > +When the register allocator uses this hook to cost spills, it also uses > > +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, passing > > +@samp{false} as the @var{existing_spills_p} argument. The intention is to > > +allow the target to apply an apples-for-apples comparison between the > > +cost of using callee-saved registers and using spills in cases where the > > +allocator has not yet committed to using both strategies. > > + > > +The default implementation returns 0. > > +@end deftypefn > > + > > @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p}) > > A C expression for the cost of a branch instruction. A value of 1 is > > the default; other values are interpreted relative to that. Parameter > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > index 6dbe22581ca..eccc4d88493 100644 > > --- a/gcc/doc/tm.texi.in > > +++ b/gcc/doc/tm.texi.in > > @@ -2388,8 +2388,6 @@ in the reload pass. > > > > @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > > > -@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE > > - > > @hook TARGET_LRA_P > > > > @hook TARGET_REGISTER_PRIORITY > > @@ -4584,6 +4582,10 @@ These macros are obsolete, new ports should use the > > target hook > > > > @hook TARGET_MEMORY_MOVE_COST > > > > +@hook TARGET_CALLEE_SAVE_COST > > + > > +@hook TARGET_FRAME_ALLOCATION_COST > > + > > @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p}) > > A C expression for the cost of a branch instruction. A value of 1 is > > the default; other values are interpreted relative to that. Parameter > > diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h > > index 48025d202b6..0d03aed5128 100644 > > --- a/gcc/hard-reg-set.h > > +++ b/gcc/hard-reg-set.h > > @@ -191,6 +191,12 @@ hard_reg_set_empty_p (const_hard_reg_set x) > > return x == HARD_CONST (0); > > } > > > > +inline int > > +hard_reg_set_popcount (const_hard_reg_set x) > > +{ > > + return popcount_hwi (x); > > +} > > + > > #else > > > > inline void > > @@ -254,6 +260,15 @@ hard_reg_set_empty_p (const_hard_reg_set x) > > bad |= x.elts[i]; > > return bad == 0; > > } > > + > > +inline int > > +hard_reg_set_popcount (const_hard_reg_set x) > > +{ > > + int count = 0; > > + for (unsigned int i = 0; i < ARRAY_SIZE (x.elts); ++i) > > + count += popcount_hwi (x.elts[i]); > > + return count; > > +} > > #endif > > > > /* Iterator for hard register sets. */ > > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > index 233060e1587..4b9296029cc 100644 > > --- a/gcc/ira-color.cc > > +++ b/gcc/ira-color.cc > > @@ -1195,10 +1195,16 @@ finish_update_cost_records (void) > > update_cost_record_pool.release (); > > } > > > > +/* True if we have allocated memory, or intend to do so. */ > > +static bool allocated_memory_p; > > + > > /* Array whose element value is TRUE if the corresponding hard > > register was already allocated for an allocno. */ > > static bool allocated_hardreg_p[FIRST_PSEUDO_REGISTER]; > > > > +/* Which callee-saved hard registers we've decided to save. */ > > +static HARD_REG_SET allocated_callee_save_regs; > > + > > /* Describes one element in a queue of allocnos whose costs need to be > > updated. Each allocno in the queue is known to have an allocno > > class. */ > > @@ -1740,6 +1746,20 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno, > > return j == nregs; > > } > > > > +/* Record that we have allocated NREGS registers starting at HARD_REGNO. > > */ > > + > > +static void > > +record_allocation (int hard_regno, int nregs) > > +{ > > + for (int i = 0; i < nregs; ++i) > > + if (!allocated_hardreg_p[hard_regno + i]) > > + { > > + allocated_hardreg_p[hard_regno + i] = true; > > + if (!crtl->abi->clobbers_full_reg_p (hard_regno + i)) > > + SET_HARD_REG_BIT (allocated_callee_save_regs, hard_regno + i); > > + } > > +} > > + > > /* Return number of registers needed to be saved and restored at > > function prologue/epilogue if we allocate HARD_REGNO to hold value > > of MODE. */ > > @@ -1961,6 +1981,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > #endif > > auto_bitmap allocnos_to_spill; > > HARD_REG_SET soft_conflict_regs = {}; > > + int entry_freq = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > + int exit_freq = REG_FREQ_FROM_BB (EXIT_BLOCK_PTR_FOR_FN (cfun)); > > + int spill_cost = 0; > > + /* Whether we have spilled pseudos or used caller-saved registers for > > values > > + that are live across a call. */ > > + bool existing_spills_p = allocated_memory_p || caller_save_needed; > > > > ira_assert (! ALLOCNO_ASSIGNED_P (a)); > > get_conflict_and_start_profitable_regs (a, retry_p, > > @@ -1979,6 +2005,18 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > start_update_cost (); > > mem_cost += ALLOCNO_UPDATED_MEMORY_COST (a); > > > > + if (!existing_spills_p) > > + { > > + auto entry_cost = targetm.frame_allocation_cost > > + (frame_cost_type::ALLOCATION, allocated_callee_save_regs); > > + spill_cost += entry_cost * entry_freq; > > + > > + auto exit_cost = targetm.frame_allocation_cost > > + (frame_cost_type::DEALLOCATION, allocated_callee_save_regs); > > + spill_cost += exit_cost * exit_freq; > > + } > > + mem_cost += spill_cost; > > + > > ira_allocate_and_copy_costs (&ALLOCNO_UPDATED_HARD_REG_COSTS (a), > > aclass, ALLOCNO_HARD_REG_COSTS (a)); > > a_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (a); > > @@ -2175,16 +2213,37 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > /* We need to save/restore the hard register in > > epilogue/prologue. Therefore we increase the cost. */ > > { > > + int nregs = hard_regno_nregs (hard_regno, mode); > > + add_cost = 0; > > rclass = REGNO_REG_CLASS (hard_regno); > > - add_cost = ((ira_memory_move_cost[mode][rclass][0] > > - + ira_memory_move_cost[mode][rclass][1]) > > - * saved_nregs / hard_regno_nregs (hard_regno, > > - mode) - 1) > > - * targetm.ira_callee_saved_register_cost_scale > > (hard_regno); > > + > > + auto entry_cost = targetm.callee_save_cost > > + (spill_cost_type::SAVE, hard_regno, mode, saved_nregs, > > + ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs, > > + allocated_callee_save_regs, existing_spills_p); > > + /* In the event of a tie between caller-save and callee-save, > > + prefer callee-save. We apply this to the entry cost rather > > + than the exit cost since the entry frequency must be at > > + least as high as the exit frequency. */ > > + if (entry_cost > 1) > > + entry_cost -= 1; > > + add_cost += entry_cost * entry_freq; > > + > > + auto exit_cost = targetm.callee_save_cost > > + (spill_cost_type::RESTORE, hard_regno, mode, saved_nregs, > > + ira_memory_move_cost[mode][rclass][1] * saved_nregs / nregs, > > + allocated_callee_save_regs, existing_spills_p); > > + add_cost += exit_cost * exit_freq; > > + > > cost += add_cost; > > full_cost += add_cost; > > } > > } > > + if (ira_need_caller_save_p (a, hard_regno)) > > + { > > + cost += spill_cost; > > + full_cost += spill_cost; > > + } > > if (min_cost > cost) > > min_cost = cost; > > if (min_full_cost > full_cost) > > @@ -2211,11 +2270,13 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > fail: > > if (best_hard_regno >= 0) > > { > > - for (i = hard_regno_nregs (best_hard_regno, mode) - 1; i >= 0; i--) > > - allocated_hardreg_p[best_hard_regno + i] = true; > > + record_allocation (best_hard_regno, > > + hard_regno_nregs (best_hard_regno, mode)); > > spill_soft_conflicts (a, allocnos_to_spill, soft_conflict_regs, > > best_hard_regno); > > } > > + else > > + allocated_memory_p = true; > > if (! retry_p) > > restore_costs_from_copies (a); > > ALLOCNO_HARD_REGNO (a) = best_hard_regno; > > @@ -3368,8 +3429,7 @@ improve_allocation (void) > > /* Assign the best chosen hard register to A. */ > > ALLOCNO_HARD_REGNO (a) = best; > > > > - for (j = nregs - 1; j >= 0; j--) > > - allocated_hardreg_p[best + j] = true; > > + record_allocation (best, nregs); > > > > if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL) > > fprintf (ira_dump_file, "Assigning %d to a%dr%d\n", > > @@ -5199,6 +5259,7 @@ color (void) > > { > > allocno_stack_vec.create (ira_allocnos_num); > > memset (allocated_hardreg_p, 0, sizeof (allocated_hardreg_p)); > > + CLEAR_HARD_REG_SET (allocated_callee_save_regs); > > ira_initiate_assign (); > > do_coloring (); > > ira_finish_assign (); > > @@ -5327,10 +5388,14 @@ ira_color (void) > > ira_allocno_iterator ai; > > > > /* Setup updated costs. */ > > + allocated_memory_p = false; > > FOR_EACH_ALLOCNO (a, ai) > > { > > ALLOCNO_UPDATED_MEMORY_COST (a) = ALLOCNO_MEMORY_COST (a); > > ALLOCNO_UPDATED_CLASS_COST (a) = ALLOCNO_CLASS_COST (a); > > + if (ALLOCNO_CLASS (a) == NO_REGS > > + && !ira_equiv_no_lvalue_p (ALLOCNO_REGNO (a))) > > + allocated_memory_p = true; > > } > > if (ira_conflicts_p) > > color (); > > diff --git a/gcc/target.def b/gcc/target.def > > index c348b15815a..6c7cdc8126b 100644 > > --- a/gcc/target.def > > +++ b/gcc/target.def > > @@ -3775,6 +3775,81 @@ are the same as to this target hook.", > > int, (machine_mode mode, reg_class_t rclass, bool in), > > default_memory_move_cost) > > > > +DEFHOOK > > +(callee_save_cost, > > + "Return the one-off cost of saving or restoring callee-saved registers\n\ > > +(also known as call-preserved registers or non-volatile registers).\n\ > > +The parameters are as follows:\n\ > > +\n\ > > +@itemize\n\ > > +@item\n\ > > +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register\n\ > > +and @samp{spill_cost_type::RESTORE} for restoring a register.\n\ > > +\n\ > > +@item\n\ > > +@var{hard_regno} and @var{mode} represent the whole register that\n\ > > +the register allocator is considering using; of these,\n\ > > +@var{nregs} registers are fully or partially callee-saved.\n\ > > +\n\ > > +@item\n\ > > +@var{mem_cost} is the normal cost for storing (for saves)\n\ > > +or loading (for restores) the @var{nregs} registers.\n\ > > +\n\ > > +@item\n\ > > +@var{allocated_callee_regs} is the set of callee-saved registers\n\ > > +that are already in use.\n\ > > +\n\ > > +@item\n\ > > +@var{existing_spills_p} is true if the register allocator has\n\ > > +already decided to spill registers to memory.\n\ > > +@end itemize\n\ > > +\n\ > > +If @var{existing_spills_p} is false, the cost of a save should account\n\ > > +for frame allocations in a way that is consistent with\n\ > > +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for > > spills.\n\ > > +Similarly, the cost of a restore should then account for frame > > deallocations\n\ > > +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s\n\ > > +handling of deallocations.\n\ > > +\n\ > > +Note that this hook should not attempt to apply a frequency scale\n\ > > +to the cost: it is the caller's responsibility to do that where\n\ > > +appropriate.\n\ > > +\n\ > > +The default implementation returns @var{mem_cost}, plus the allocation\n\ > > +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},\n\ > > +where appropriate.", > > + int, (spill_cost_type cost_type, unsigned int hard_regno, > > + machine_mode mode, unsigned int nregs, int mem_cost, > > + const HARD_REG_SET &allocated_callee_regs, bool existing_spills_p), > > + default_callee_save_cost) > > + > > +DEFHOOK > > +(frame_allocation_cost, > > + "Return the cost of allocating or deallocating a frame for the sake of\n\ > > +a spill; @var{cost_type} chooses between allocation and deallocation.\n\ > > +The term ``spill'' here includes both forcing a pseudo register to > > memory\n\ > > +and using caller-saved registers for pseudo registers that are live > > across\n\ > > +a call.\n\ > > +\n\ > > +This hook is only called if the register allocator has not so far\n\ > > +decided to spill. The allocator may have decided to use callee-saved\n\ > > +registers; if so, @var{allocated_callee_regs} is the set of callee-saved\n\ > > +registers that the allocator has used. There might also be other > > reasons\n\ > > +why a stack frame is already needed; for example, @samp{get_frame_size > > ()}\n\ > > +might be nonzero, or the target might already require a frame for\n\ > > +target-specific reasons.\n\ > > +\n\ > > +When the register allocator uses this hook to cost spills, it also uses\n\ > > +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, > > passing\n\ > > +@samp{false} as the @var{existing_spills_p} argument. The intention is > > to\n\ > > +allow the target to apply an apples-for-apples comparison between the\n\ > > +cost of using callee-saved registers and using spills in cases where the\n\ > > +allocator has not yet committed to using both strategies.\n\ > > +\n\ > > +The default implementation returns 0.", > > + int, (frame_cost_type cost_type, const HARD_REG_SET > > &allocated_callee_regs), > > + default_frame_allocation_cost) > > + > > DEFHOOK > > (use_by_pieces_infrastructure_p, > > "GCC will attempt several strategies when asked to copy between\n\ > > @@ -5714,18 +5789,6 @@ DEFHOOK > > reg_class_t, (int, reg_class_t, reg_class_t), > > default_ira_change_pseudo_allocno_class) > > > > -/* Scale of callee-saved register cost in epilogue and prologue used by > > - IRA. */ > > -DEFHOOK > > -(ira_callee_saved_register_cost_scale, > > - "A target hook which returns the callee-saved register @var{hard_regno}\n\ > > -cost scale in epilogue and prologue used by IRA.\n\ > > -\n\ > > -The default version of this target hook returns 1 if optimizing for\n\ > > -size, otherwise returns the entry block frequency.", > > - int, (int hard_regno), > > - default_ira_callee_saved_register_cost_scale) > > - > > /* Return true if we use LRA instead of reload. */ > > DEFHOOK > > (lra_p, > > diff --git a/gcc/target.h b/gcc/target.h > > index 3e1ee68a341..2bf35e2d0ee 100644 > > --- a/gcc/target.h > > +++ b/gcc/target.h > > @@ -284,6 +284,18 @@ enum poly_value_estimate_kind > > POLY_VALUE_LIKELY > > }; > > > > +enum class spill_cost_type > > +{ > > + SAVE, > > + RESTORE > > +}; > > + > > +enum class frame_cost_type > > +{ > > + ALLOCATION, > > + DEALLOCATION > > +}; > > + > > typedef void (*emit_support_tinfos_callback) (tree); > > > > extern bool verify_type_context (location_t, type_context_kind, const_tree, > > diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc > > index 344075efa41..c79458e374e 100644 > > --- a/gcc/targhooks.cc > > +++ b/gcc/targhooks.cc > > @@ -2083,6 +2083,33 @@ default_register_move_cost (machine_mode mode > > ATTRIBUTE_UNUSED, > > #endif > > } > > > > +/* The default implementation of TARGET_CALLEE_SAVE_COST. */ > > + > > +int > > +default_callee_save_cost (spill_cost_type spill_type, unsigned int, > > + machine_mode, unsigned int, int mem_cost, > > + const HARD_REG_SET &callee_saved_regs, > > + bool existing_spills_p) > > +{ > > + if (!existing_spills_p) > > + { > > + auto frame_type = (spill_type == spill_cost_type::SAVE > > + ? frame_cost_type::ALLOCATION > > + : frame_cost_type::DEALLOCATION); > > + mem_cost += targetm.frame_allocation_cost (frame_type, > > + callee_saved_regs); > > + } > > + return mem_cost; > > +} > > + > > +/* The default implementation of TARGET_FRAME_ALLOCATION_COST. */ > > + > > +int > > +default_frame_allocation_cost (frame_cost_type, const HARD_REG_SET &) > > +{ > > + return 0; > > +} > > + > > /* The default implementation of TARGET_SLOW_UNALIGNED_ACCESS. */ > > > > bool > > diff --git a/gcc/targhooks.h b/gcc/targhooks.h > > index 8871e01430c..f16b58798c2 100644 > > --- a/gcc/targhooks.h > > +++ b/gcc/targhooks.h > > @@ -235,6 +235,11 @@ extern tree default_builtin_tm_load_store (tree); > > extern int default_memory_move_cost (machine_mode, reg_class_t, bool); > > extern int default_register_move_cost (machine_mode, reg_class_t, > > reg_class_t); > > +extern int default_callee_save_cost (spill_cost_type, unsigned int, > > + machine_mode, unsigned int, int, > > + const HARD_REG_SET &, bool); > > +extern int default_frame_allocation_cost (frame_cost_type, > > + const HARD_REG_SET &); > > extern bool default_slow_unaligned_access (machine_mode, unsigned int); > > extern HOST_WIDE_INT default_estimated_poly_value (poly_int64, > > > > poly_value_estimate_kind); > > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_1.c > > b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c > > new file mode 100644 > > index 00000000000..f28486112f4 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c > > @@ -0,0 +1,12 @@ > > +/* { dg-options "-O2" } */ > > + > > +int test (int x), test2 (int x); > > + > > +int foo (int x, int y) { > > + test (x); > > + int lhs = test2 (y); > > + return x + lhs; > > +} > > + > > +/* { dg-final { scan-assembler {\tstp\tx19, x20, \[sp,} } } */ > > +/* { dg-final { scan-assembler {\tldp\tx19, x20, \[sp,} } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_2.c > > b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c > > new file mode 100644 > > index 00000000000..744b464be2f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c > > @@ -0,0 +1,14 @@ > > +/* { dg-options "-O2 -fomit-frame-pointer" } */ > > + > > +int test (int x), test2 (int x); > > + > > +int foo (int x, int y) { > > + test (x); > > + int lhs = test2 (y); > > + return x + lhs; > > +} > > + > > +/* { dg-final { scan-assembler {\tstp\tx30, x19, \[sp,} } } */ > > +/* { dg-final { scan-assembler {\tldp\tx30, x19, \[sp\],} } } */ > > +/* { dg-final { scan-assembler {\tstr\tw1, \[sp,} } } */ > > +/* { dg-final { scan-assembler {\tldr\tw0, \[sp,} } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_3.c > > b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c > > new file mode 100644 > > index 00000000000..50b6853e4ee > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c > > @@ -0,0 +1,12 @@ > > +/* { dg-options "-O2" } */ > > + > > +float test (); > > +float g; > > + > > +float foo (float x, float y) { > > + g = x + test (); > > + return (x + test ()) * y; > > +} > > + > > +/* { dg-final { scan-assembler {\tstp\td14, d15, \[sp,} } } */ > > +/* { dg-final { scan-assembler {\tldp\td14, d15, \[sp,} } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c > > b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c > > index a0e764e8653..129c6ac90e0 100644 > > --- a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c > > +++ b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c > > @@ -1,5 +1,5 @@ > > /* { dg-do run { target le } } */ > > -/* { dg-additional-options "-Os -fno-tree-ter -save-temps > > -fdump-rtl-ree-all -free -std=c99 -w" } */ > > +/* { dg-additional-options "-Os -fno-tree-ter -save-temps > > -fdump-rtl-ree-all -free -std=c99 -w -fno-caller-saves" } */ > > > > typedef unsigned char u8; > > typedef unsigned char __attribute__((__vector_size__ (8))) v64u8; > > -- > > 2.25.1 > >
-- BR, Hongtao