> Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>:
>
> On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
>>
>>
>>
>>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>:
>>>
>>> Don't increase callee-saved register cost by 1000x, which leads to that
>>> callee-saved registers aren't used to preserve local variable values
>>> across calls, by capping the scale to 300.
>>
>>> PR rtl-optimization/111673
>>> PR rtl-optimization/115932
>>> PR rtl-optimization/116028
>>> PR rtl-optimization/117081
>>> PR rtl-optimization/118497
>>> * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
>>> scale to 300.
>>>
>>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
>>> ---
>>> gcc/ira-color.cc | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
>>> index 0699b349a1a..707ff188250 100644
>>> --- a/gcc/ira-color.cc
>>> +++ b/gcc/ira-color.cc
>>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>>> /* We need to save/restore the hard register in
>>> epilogue/prologue. Therefore we increase the cost. */
>>> {
>>> + int scale;
>>> + if (optimize_size)
>>> + scale = 1;
>>> + else
>>> + {
>>> + scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>> + /* Don't increase callee-saved register cost by 1000x,
>>> + which leads to that callee-saved registers aren't
>>> + used to preserve local variable values across calls,
>>> + by capping the scale to 300. */
>>> + if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
>>> + scale = 300;
>>
>> That leads to 300 for 1000 but 999 for 999 which is odd. I’d have expected
>> to scale this down to [0, 300] or is MAX a magic value?
>
> There are
>
> * The weights for each insn varies from 0 to REG_FREQ_BASE.
> This constant does not need to be high, as in infrequently executed
> regions we want to count instructions equivalently to optimize for
> size instead of speed. */
> #define REG_FREQ_MAX 1000
>
> /* Compute register frequency from the BB frequency. When optimizing for
> size,
> or profile driven feedback is available and the function is never executed,
> frequency is always equivalent. Otherwise rescale the basic block
> frequency. */
> #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)
> \
> || !cfun->cfg->count_max.initialized_p ()) \
> ? REG_FREQ_MAX \
> : ((bb)->count.to_frequency (cfun) \
> * REG_FREQ_MAX / BB_FREQ_MAX) \
> ? ((bb)->count.to_frequency (cfun) \
> * REG_FREQ_MAX / BB_FREQ_MAX) \
> : 1)
>
> 1000 is the default. If it isn't 1000, it isn't the default. I only want
> to get a more reasonable default scale, instead of 1000. Lower
> scale will fail the PR rtl-optimization/111673 test on powerpc64.
I see. Why not adjust the above macro then? That would be a bit more obvious.
Like use MAX/2 or so?
>
>
>>> + }
>>> rclass = REGNO_REG_CLASS (hard_regno);
>>> add_cost = ((ira_memory_move_cost[mode][rclass][0]
>>> + ira_memory_move_cost[mode][rclass][1])
>>> * saved_nregs / hard_regno_nregs (hard_regno,
>>> mode) - 1)
>>> - * (optimize_size ? 1 :
>>> - REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> + * scale;
>>> cost += add_cost;
>>> full_cost += add_cost;
>>> }
>>> --
>>> 2.48.1
>>>
>
>
>
> --
> H.J.