On Mon, Feb 3, 2025 at 5:21 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > > <richard.guent...@gmail.com> wrote: > > > >> > > > >> > > > >> > > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > >>> > > > >>> Don't increase callee-saved register cost by 1000x, which leads to > > > >>> that > > > >>> callee-saved registers aren't used to preserve local variable values > > > >>> across calls, by capping the scale to 300. > > > >> > > > >>> PR rtl-optimization/111673 > > > >>> PR rtl-optimization/115932 > > > >>> PR rtl-optimization/116028 > > > >>> PR rtl-optimization/117081 > > > >>> PR rtl-optimization/118497 > > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > > >>> scale to 300. > > > >>> > > > >>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > > > >>> --- > > > >>> gcc/ira-color.cc | 16 ++++++++++++++-- > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > > >>> > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > > >>> index 0699b349a1a..707ff188250 100644 > > > >>> --- a/gcc/ira-color.cc > > > >>> +++ b/gcc/ira-color.cc > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool > > > >>> retry_p) > > > >>> /* We need to save/restore the hard register in > > > >>> epilogue/prologue. Therefore we increase the cost. */ > > > >>> { > > > >>> + int scale; > > > >>> + if (optimize_size) > > > >>> + scale = 1; > > > >>> + else > > > >>> + { > > > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > > >>> + /* Don't increase callee-saved register cost by 1000x, > > > >>> + which leads to that callee-saved registers aren't > > > >>> + used to preserve local variable values across calls, > > > >>> + by capping the scale to 300. */ > > > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > > >>> + scale = 300; > > > >> > > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > > > There are > > > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > > This constant does not need to be high, as in infrequently executed > > > > regions we want to count instructions equivalently to optimize for > > > > size instead of speed. */ > > > > #define REG_FREQ_MAX 1000 > > > > > > > > /* Compute register frequency from the BB frequency. When optimizing > > > > for size, > > > > or profile driven feedback is available and the function is never > > > > executed, > > > > frequency is always equivalent. Otherwise rescale the basic block > > > > frequency. */ > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > > \ > > > > || !cfun->cfg->count_max.initialized_p > > > > ()) \ > > > > ? REG_FREQ_MAX > > > > \ > > > > : ((bb)->count.to_frequency (cfun) > > > > \ > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > ? ((bb)->count.to_frequency (cfun) > > > > \ > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > \ > > > > : 1) > > > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only > > > > want > > > > to get a more reasonable default scale, instead of 1000. Lower > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > > > I see. Why not adjust the above macro then? That would be a bit more > > > obvious. Like use MAX/2 or so? > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > Author: Surya Kumari Jangala <jskum...@linux.ibm.com> > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > ira: Scale save/restore costs of callee save registers with block > > frequency > > > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > > not as a cost scale. Maybe the above commit should be reverted and we add > > a target hook for callee-saved register cost scale. Each target can choose > > a proper cost scale, install of increasing the cost by 1000x for everyone. > > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least, > as it doesn't seem to be used. The comment talks about profile feedback, > but for example with -fprofile-correction or -fpartial-profile this > test looks odd. > In fact optimize_function_for_size_p should already handle this correctly. > > Also REG_FREQ_FROM_BB simply documents that in this case the > frequency will be equivalent for all BBs and not any particular value. The > new > use might indeed not have the same constraints as others, instead of > a target hook making the "same value" another macro argument might be > a good first step. > > That said - does removing the || !cfun->cfg->count_max.initialized_p () > test "fix" things?
I don't think so. Frequency value can be anything, 1000 or 100000. It is relative. It is very odd to use it as the cost scale. Here is the v2 patch https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674940.html to add a target hook for callee-saved register cost scale and restore the old behavior for x86. -- H.J.