On Mon, Feb 3, 2025 at 5:21 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Sun, Feb 2, 2025 at 9:29 AM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > On Sun, Feb 2, 2025 at 4:20 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > >
> > >
> > > > Am 02.02.2025 um 08:59 schrieb H.J. Lu <hjl.to...@gmail.com>:
> > > >
> > > > On Sun, Feb 2, 2025 at 3:33 PM Richard Biener
> > > > <richard.guent...@gmail.com> wrote:
> > > >>
> > > >>
> > > >>
> > > >>>> Am 02.02.2025 um 08:00 schrieb H.J. Lu <hjl.to...@gmail.com>:
> > > >>>
> > > >>> Don't increase callee-saved register cost by 1000x, which leads to 
> > > >>> that
> > > >>> callee-saved registers aren't used to preserve local variable values
> > > >>> across calls, by capping the scale to 300.
> > > >>
> > > >>>   PR rtl-optimization/111673
> > > >>>   PR rtl-optimization/115932
> > > >>>   PR rtl-optimization/116028
> > > >>>   PR rtl-optimization/117081
> > > >>>   PR rtl-optimization/118497
> > > >>>   * ira-color.cc (assign_hard_reg): Cap callee-saved register cost
> > > >>>   scale to 300.
> > > >>>
> > > >>> Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
> > > >>> ---
> > > >>> gcc/ira-color.cc | 16 ++++++++++++++--
> > > >>> 1 file changed, 14 insertions(+), 2 deletions(-)
> > > >>>
> > > >>> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > > >>> index 0699b349a1a..707ff188250 100644
> > > >>> --- a/gcc/ira-color.cc
> > > >>> +++ b/gcc/ira-color.cc
> > > >>> @@ -2175,13 +2175,25 @@ assign_hard_reg (ira_allocno_t a, bool 
> > > >>> retry_p)
> > > >>>     /* We need to save/restore the hard register in
> > > >>>        epilogue/prologue.  Therefore we increase the cost.  */
> > > >>>     {
> > > >>> +        int scale;
> > > >>> +        if (optimize_size)
> > > >>> +          scale = 1;
> > > >>> +        else
> > > >>> +          {
> > > >>> +        scale = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > > >>> +        /* Don't increase callee-saved register cost by 1000x,
> > > >>> +           which leads to that callee-saved registers aren't
> > > >>> +           used to preserve local variable values across calls,
> > > >>> +           by capping the scale to 300.  */
> > > >>> +        if (REG_FREQ_MAX == 1000 && scale == REG_FREQ_MAX)
> > > >>> +          scale = 300;
> > > >>
> > > >> That leads to 300 for 1000 but 999 for 999 which is odd.  I’d have 
> > > >> expected to scale this down to [0, 300] or is MAX a magic value?
> > > >
> > > > There are
> > > >
> > > > * The weights for each insn varies from 0 to REG_FREQ_BASE.
> > > >   This constant does not need to be high, as in infrequently executed
> > > >   regions we want to count instructions equivalently to optimize for
> > > >   size instead of speed.  */
> > > > #define REG_FREQ_MAX 1000
> > > >
> > > > /* Compute register frequency from the BB frequency.  When optimizing 
> > > > for size,
> > > >   or profile driven feedback is available and the function is never 
> > > > executed,
> > > >   frequency is always equivalent.  Otherwise rescale the basic block
> > > >   frequency.  */
> > > > #define REG_FREQ_FROM_BB(bb) ((optimize_function_for_size_p (cfun)      
> > > >       \
> > > >                               || !cfun->cfg->count_max.initialized_p 
> > > > ())     \
> > > >                              ? REG_FREQ_MAX                             
> > > >      \
> > > >                              : ((bb)->count.to_frequency (cfun)         
> > > >      \
> > > >                                * REG_FREQ_MAX / BB_FREQ_MAX)            
> > > >      \
> > > >                              ? ((bb)->count.to_frequency (cfun)         
> > > >      \
> > > >                                 * REG_FREQ_MAX / BB_FREQ_MAX)           
> > > >      \
> > > >                              : 1)
> > > >
> > > > 1000 is the default.  If it isn't 1000, it isn't the default.  I only 
> > > > want
> > > > to get a more reasonable default scale, instead of 1000.   Lower
> > > > scale will fail the PR rtl-optimization/111673 test on powerpc64.
> > >
> > > I see.  Why not adjust the above macro then?  That would be a bit more 
> > > obvious.  Like use MAX/2 or so?
> >
> > commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
> > Author: Surya Kumari Jangala <jskum...@linux.ibm.com>
> > Date:   Tue Jun 25 08:37:49 2024 -0500
> >
> >     ira: Scale save/restore costs of callee save registers with block 
> > frequency
> >
> > uses REG_FREQ_FROM_BB as the cost scale.  I don't know if it is a misuse.
> > I don't want to change REG_FREQ_FROM_BB since it is used in other places,
> > not as a cost scale.  Maybe the above commit should be reverted and we add
> > a target hook for callee-saved register cost scale.  Each target can choose
> > a proper cost scale, install of increasing the cost by 1000x for everyone.
>
> I believe testing cfun->cfg->count_max.initialized_p () is a bit odd at least,
> as it doesn't seem to be used.  The comment talks about profile feedback,
> but for example with -fprofile-correction or -fpartial-profile this
> test looks odd.
> In fact optimize_function_for_size_p should already handle this correctly.
>
> Also REG_FREQ_FROM_BB simply documents that in this case the
> frequency will be equivalent for all BBs and not any particular value.  The 
> new
> use might indeed not have the same constraints as others, instead of
> a target hook making the "same value" another macro argument might be
> a good first step.
>
> That said - does removing the || !cfun->cfg->count_max.initialized_p ()
> test "fix" things?

I don't think so.  Frequency value can be anything, 1000 or 100000.
It is relative.  It is very odd to use it as the cost scale.  Here is the v2
patch

https://gcc.gnu.org/pipermail/gcc-patches/2025-February/674940.html

to add a target hook for callee-saved register cost scale and restore the
old behavior for x86.

-- 
H.J.

Reply via email to