On Sun, Feb 2, 2025 at 4:20 PM Richard Biener <richard.guent...@gmail.com> wrote: > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > >> > >> > >> > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>: > >>> > >>> Don't increase callee-saved register cost by 1000x, which leads to that > >>> callee-saved registers aren't used to preserve local variable values > >>> across calls, by capping the scale to 300. > >> > >>> PR rtl-optimization/111673 > >>> PR rtl-optimization/115932 > >>> PR rtl-optimization/116028 > >>> PR rtl-optimization/117081 > >>> PR rtl-optimization/118497 > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > >>> scale to 300. > >>> > >>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > >>> --- > >>> gcc/ira-color.cc | 16 ++++++++++++++-- > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > >>> index 0699b349a1a..707ff188250 100644 > >>> --- a/gcc/ira-color.cc > >>> +++ b/gcc/ira-color.cc > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > >>> /* We need to save/restore the hard register in > >>> epilogue/prologue. Therefore we increase the cost. */ > >>> { > >>> + int scale; > >>> + if (optimize_size) > >>> + scale = 1; > >>> + else > >>> + { > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > >>> + /* Don't increase callee-saved register cost by 1000x, > >>> + which leads to that callee-saved registers aren't > >>> + used to preserve local variable values across calls, > >>> + by capping the scale to 300. */ > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > >>> + scale = 300; > >> > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > There are > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > This constant does not need to be high, as in infrequently executed > > regions we want to count instructions equivalently to optimize for > > size instead of speed. */ > > #define REG_FREQ_MAX 1000 > > > > /* Compute register frequency from the BB frequency. When optimizing for > > size, > > or profile driven feedback is available and the function is never > > executed, > > frequency is always equivalent. Otherwise rescale the basic block > > frequency. */ > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > \ > > || !cfun->cfg->count_max.initialized_p ()) > > \ > > ? REG_FREQ_MAX > > \ > > : ((bb)->count.to_frequency (cfun) > > \ > > * REG_FREQ_MAX / BB_FREQ_MAX) > > \ > > ? ((bb)->count.to_frequency (cfun) > > \ > > * REG_FREQ_MAX / BB_FREQ_MAX) > > \ > > : 1) > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want > > to get a more reasonable default scale, instead of 1000. Lower > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > I see. Why not adjust the above macro then? That would be a bit more > obvious. Like use MAX/2 or so?
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b Author: Surya Kumari Jangala <jskum...@linux.ibm.com> Date: Tue Jun 25 08:37:49 2024 -0500 ira: Scale save/restore costs of callee save registers with block frequency uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. I don't want to change REG_FREQ_FROM_BB since it is used in other places, not as a cost scale. Maybe the above commit should be reverted and we add a target hook for callee-saved register cost scale. Each target can choose a proper cost scale, install of increasing the cost by 1000x for everyone. > > > > > >>> + } > >>> rclass = REGNO_REG_CLASS (hard_regno); > >>> add_cost = ((ira_memory_move_cost[mode][rclass][0] > >>> + ira_memory_move_cost[mode][rclass][1]) > >>> * saved_nregs / hard_regno_nregs (hard_regno, > >>> mode) - 1) > >>> - * (optimize_size ? 1 : > >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > >>> + * scale; > >>> cost += add_cost; > >>> full_cost += add_cost; > >>> } > >>> -- > >>> 2.48.1 > >>> > > > > > > > > -- > > H.J. -- H.J.