On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > <richard.guent...@gmail.com> wrote: > > >> > > >> > > >> > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>: > > >>> > > >>> Don't increase callee-saved register cost by 1000x, which leads to that > > >>> callee-saved registers aren't used to preserve local variable values > > >>> across calls, by capping the scale to 300. > > >> > > >>> PR rtl-optimization/111673 > > >>> PR rtl-optimization/115932 > > >>> PR rtl-optimization/116028 > > >>> PR rtl-optimization/117081 > > >>> PR rtl-optimization/118497 > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > >>> scale to 300. > > >>> > > >>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > > >>> --- > > >>> gcc/ira-color.cc | 16 ++++++++++++++-- > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > >>> index 0699b349a1a..707ff188250 100644 > > >>> --- a/gcc/ira-color.cc > > >>> +++ b/gcc/ira-color.cc > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > > >>> /* We need to save/restore the hard register in > > >>> epilogue/prologue. Therefore we increase the cost. */ > > >>> { > > >>> + int scale; > > >>> + if (optimize_size) > > >>> + scale = 1; > > >>> + else > > >>> + { > > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > >>> + /* Don't increase callee-saved register cost by 1000x, > > >>> + which leads to that callee-saved registers aren't > > >>> + used to preserve local variable values across calls, > > >>> + by capping the scale to 300. */ > > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > >>> + scale = 300; > > >> > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > There are > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > This constant does not need to be high, as in infrequently executed > > > regions we want to count instructions equivalently to optimize for > > > size instead of speed. */ > > > #define REG_FREQ_MAX 1000 > > > > > > /* Compute register frequency from the BB frequency. When optimizing for > > > size, > > > or profile driven feedback is available and the function is never > > > executed, > > > frequency is always equivalent. Otherwise rescale the basic block > > > frequency. */ > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > \ > > > || !cfun->cfg->count_max.initialized_p ()) > > > \ > > > ? REG_FREQ_MAX > > > \ > > > : ((bb)->count.to_frequency (cfun) > > > \ > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > \ > > > ? ((bb)->count.to_frequency (cfun) > > > \ > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > \ > > > : 1) > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only want > > > to get a more reasonable default scale, instead of 1000. Lower > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > I see. Why not adjust the above macro then? That would be a bit more > > obvious. Like use MAX/2 or so? > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > Author: Surya Kumari Jangala <jskum...@linux.ibm.com> > Date: Tue Jun 25 08:37:49 2024 -0500 > > ira: Scale save/restore costs of callee save registers with block > frequency > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > not as a cost scale. Maybe the above commit should be reverted and we add > a target hook for callee-saved register cost scale. Each target can choose > a proper cost scale, install of increasing the cost by 1000x for everyone.
I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least, as it doesn't seem to be used. The comment talks about profile feedback, but for example with -fprofile-correction or -fpartial-profile this test looks odd. In fact optimize_function_for_size_p should already handle this correctly. Also REG_FREQ_FROM_BB simply documents that in this case the frequency will be equivalent for all BBs and not any particular value. The new use might indeed not have the same constraints as others, instead of a target hook making the "same value" another macro argument might be a good first step. That said - does removing the || !cfun->cfg->count_max.initialized_p () test "fix" things? Richard. > > > > > > > > >>> + } > > >>> rclass = REGNO_REG_CLASS (hard_regno); > > >>> add_cost = ((ira_memory_move_cost[mode][rclass][0] > > >>> + ira_memory_move_cost[mode][rclass][1]) > > >>> * saved_nregs / hard_regno_nregs (hard_regno, > > >>> mode) - 1) > > >>> - * (optimize_size ? 1 : > > >>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun))); > > >>> + * scale; > > >>> cost += add_cost; > > >>> full_cost += add_cost; > > >>> } > > >>> -- > > >>> 2.48.1 > > >>> > > > > > > > > > > > > -- > > > H.J. > > > > -- > H.J.