> Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>:
> 
> On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
>> 
>> 
>> 
>>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>:
>>> 
>>> Don't increase callee-saved register cost by 1000x, which leads to that
>>> callee-saved registers aren't used to preserve local variable values
>>> across calls, by capping the scale to 300.
>> 
>>>   PR rtl-optimization/111673
>>>   PR rtl-optimization/115932
>>>   PR rtl-optimization/116028
>>>   PR rtl-optimization/117081
>>>   PR rtl-optimization/118497
>>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
>>>   scale to 300.
>>> 
>>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
>>> ---
>>> gcc/ira-color.cc | 16 ++++++++++++++--
>>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
>>> index 0699b349a1a..707ff188250 100644
>>> --- a/gcc/ira-color.cc
>>> +++ b/gcc/ira-color.cc
>>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>>>     /* We need to save/restore the hard register in
>>>        epilogue/prologue.  Therefore we increase the cost.  */
>>>     {
>>> +        int scale;
>>> +        if (optimize_size)
>>> +          scale = 1;
>>> +        else
>>> +          {
>>> +        scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
>>> +        /* Don't increase callee-saved register cost by 1000x,
>>> +           which leads to that callee-saved registers aren't
>>> +           used to preserve local variable values across calls,
>>> +           by capping the scale to 300.  */
>>> +        if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
>>> +          scale = 300;
>> 
>> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have expected 
>> to scale this down to [0, 300] or is MAX a magic value?
> 
> There are
> 
> * The weights for each insn varies from 0 to REG_FREQ_BASE.
>   This constant does not need to be high, as in infrequently executed
>   regions we want to count instructions equivalently to optimize for
>   size instead of speed.  */
> #define REG_FREQ_MAX 1000
> 
> /* Compute register frequency from the BB frequency.  When optimizing for 
> size,
>   or profile driven feedback is available and the function is never executed,
>   frequency is always equivalent.  Otherwise rescale the basic block
>   frequency.  */
> #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)            
> \
>                               || !cfun->cfg->count_max.initialized_p ())     \
>                              ? REG_FREQ_MAX                                  \
>                              : ((bb)->count.to_frequency (cfun)              \
>                                * REG_FREQ_MAX / BB_FREQ_MAX)                 \
>                              ? ((bb)->count.to_frequency (cfun)              \
>                                 * REG_FREQ_MAX / BB_FREQ_MAX)                \
>                              : 1)
> 
> 1000 is the default.  If it isn't 1000, it isn't the default.  I only want
> to get a more reasonable default scale, instead of 1000.   Lower
> scale will fail the PR rtl-optimization/111673 test on powerpc64.

I see.  Why not adjust the above macro then?  That would be a bit more obvious. 
 Like use MAX/2 or so?

> 
> 
>>> +          }
>>>       rclass = REGNO_REG_CLASS (hard_regno);
>>>       add_cost = ((ira_memory_move_cost[mode][rclass][0]
>>>                + ira_memory_move_cost[mode][rclass][1])
>>>               * saved_nregs / hard_regno_nregs (hard_regno,
>>>                             mode) - 1)
>>> -               * (optimize_size ? 1 :
>>> -              REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
>>> +               * scale;
>>>       cost += add_cost;
>>>       full_cost += add_cost;
>>>     }
>>> --
>>> 2.48.1
>>> 
> 
> 
> 
> --
> H.J.

Reply via email to