> On Mon, Feb 3, 2025 at 5:21 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener > > > <richard.guent...@gmail.com> wrote: > > > > > > > > > > > > > > > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > > > > > > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener > > > > > <richard.guent...@gmail.com> wrote: > > > > >> > > > > >> > > > > >> > > > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>: > > > > >>> > > > > >>> Don't increase callee-saved register cost by 1000x, which leads to > > > > >>> that > > > > >>> callee-saved registers aren't used to preserve local variable values > > > > >>> across calls, by capping the scale to 300. > > > > >> > > > > >>> PR rtl-optimization/111673 > > > > >>> PR rtl-optimization/115932 > > > > >>> PR rtl-optimization/116028 > > > > >>> PR rtl-optimization/117081 > > > > >>> PR rtl-optimization/118497 > > > > >>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost > > > > >>> scale to 300. > > > > >>> > > > > >>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com> > > > > >>> --- > > > > >>> gcc/ira-color.cc | 16 ++++++++++++++-- > > > > >>> 1 file changed, 14 insertions(+), 2 deletions(-) > > > > >>> > > > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > > > > >>> index 0699b349a1a..707ff188250 100644 > > > > >>> --- a/gcc/ira-color.cc > > > > >>> +++ b/gcc/ira-color.cc > > > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool > > > > >>> retry_p) > > > > >>> /* We need to save/restore the hard register in > > > > >>> epilogue/prologue. Therefore we increase the cost. */ > > > > >>> { > > > > >>> + int scale; > > > > >>> + if (optimize_size) > > > > >>> + scale = 1; > > > > >>> + else > > > > >>> + { > > > > >>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); > > > > >>> + /* Don't increase callee-saved register cost by 1000x, > > > > >>> + which leads to that callee-saved registers aren't > > > > >>> + used to preserve local variable values across calls, > > > > >>> + by capping the scale to 300. */ > > > > >>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX) > > > > >>> + scale = 300; > > > > >> > > > > >> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have > > > > >> expected to scale this down to [0, 300] or is MAX a magic value? > > > > > > > > > > There are > > > > > > > > > > * The weights for each insn varies from 0 to REG_FREQ_BASE. > > > > > This constant does not need to be high, as in infrequently executed > > > > > regions we want to count instructions equivalently to optimize for > > > > > size instead of speed. */ > > > > > #define REG_FREQ_MAX 1000 > > > > > > > > > > /* Compute register frequency from the BB frequency. When optimizing > > > > > for size, > > > > > or profile driven feedback is available and the function is never > > > > > executed, > > > > > frequency is always equivalent. Otherwise rescale the basic block > > > > > frequency. */ > > > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun) > > > > > \ > > > > > || !cfun->cfg->count_max.initialized_p > > > > > ()) \ > > > > > ? REG_FREQ_MAX > > > > > \ > > > > > : ((bb)->count.to_frequency (cfun) > > > > > \ > > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > > \ > > > > > ? ((bb)->count.to_frequency (cfun) > > > > > \ > > > > > * REG_FREQ_MAX / BB_FREQ_MAX) > > > > > \ > > > > > : 1) > > > > > > > > > > 1000 is the default. If it isn't 1000, it isn't the default. I only > > > > > want > > > > > to get a more reasonable default scale, instead of 1000. Lower > > > > > scale will fail the PR rtl-optimization/111673 test on powerpc64. > > > > > > > > I see. Why not adjust the above macro then? That would be a bit more > > > > obvious. Like use MAX/2 or so? > > > > > > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b > > > Author: Surya Kumari Jangala <jskum...@linux.ibm.com> > > > Date: Tue Jun 25 08:37:49 2024 -0500 > > > > > > ira: Scale save/restore costs of callee save registers with block > > > frequency > > > > > > uses REG_FREQ_FROM_BB as the cost scale. I don't know if it is a misuse. > > > I don't want to change REG_FREQ_FROM_BB since it is used in other places, > > > not as a cost scale. Maybe the above commit should be reverted and we add > > > a target hook for callee-saved register cost scale. Each target can > > > choose > > > a proper cost scale, install of increasing the cost by 1000x for everyone. > > > > I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at > > least, > > as it doesn't seem to be used. The comment talks about profile feedback, > > but for example with -fprofile-correction or -fpartial-profile this > > test looks odd. > > In fact optimize_function_for_size_p should already handle this correctly.
count_max is computed in update_max_bb_count and if some basic blocks do not have initialized count, they are ignored. So if you have any initialized count, count_max should be greater or equal to it after update call. This may be later changed if multiple code paths are merged to one and then BB count can be somewhat larger than max. With -fprofile-correction or -fpartial-profile all BBs sould have counts initialized. If profile feedback is missing we keep guessed profile if feedback is incoherent we only attemt to correct and it and drop quality info. If some passes later create new BBs with unitialized profile, they should be fixed. What can happen is inlining function without profile (-O0) to function with profile. In that case you get CFG with both initialized and uninitialized BBs. But we probably do not care that much about performance of htis. > > > > Also REG_FREQ_FROM_BB simply documents that in this case the > > frequency will be equivalent for all BBs and not any particular value. The > > new > > use might indeed not have the same constraints as others, instead of > > a target hook making the "same value" another macro argument might be > > a good first step. > > > > That said - does removing the || !cfun->cfg->count_max.initialized_p () > > test "fix" things? > > I don't think so. Frequency value can be anything, 1000 or 100000. > It is relative. It is very odd to use it as the cost scale. Here is the v2 > patch > > https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674940.html > > to add a target hook for callee-saved register cost scale and restore the > old behavior for x86. Original prupose of REG_FREQ_FROM_BB macro was to translate new profile counts to old BB frequencies. Those were saled to be in the range 0....REG_FREQ_MAX where REG_FREQ_MAX was set to 1000 so one can do third power in 32bit value chaply. In early 2000s when this code was written 64bit arithmetics was expensive. It also tries to imitate old code in register allocator which used constnat value of 1000 for everything if profile was missing or we optimize for size (and thus static count of reloads). These days it is probably smoother to write code that cares about relative frequencies using sreals unless it is very performance critical. Honza