Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Andrew Pinski Fri, 07 Feb 2025 09:52:39 -0800

On Fri, Feb 7, 2025 at 9:20 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Richard Sandiford <richard.sandif...@arm.com> writes:
> > Really nice analysis!  Thanks for writing this up.
> >
> > Sorry for the big quote below, but:
> >
> > Jan Hubicka <hubi...@ucw.cz> writes:
> >> [...]
> >> PR117081 is about regression in povray. The reducted testcase:
> >>
> >> void foo (void);
> >> void bar (void);
> >>
> >> int
> >> test (int a)
> >> {
> >>   int r;
> >>
> >>   if (r = -a)
> >>     foo ();
> >>   else
> >>     bar ();
> >>
> >>   return r;
> >> }
> >>
> >> shows that we now use caller saved register (EAX) to hold the return value 
> >> which yields longer code.  The costs are
> >> Popping a0(r98,l0)  -- (0=13000,13000) (3=15000,15000) (6=15000,15000) 
> >> (40=15000,15000) (41=15000,15000) (42=15000,15000) (43=15000,15000)
> >>
> >> here 15000 is 11000+4000 where I think 4000 is cost of 2 reg-reg moves
> >> multiplied by REG_FREQ_MAX.   This seems correct. GCC 13 uses callee
> >> saved register and produces:
> >>
> >> 0000000000000000 <test>:
> >>    0:        53                      push   %rbx             <--- callee 
> >> save
> >>    1:        89 fb                   mov    %edi,%ebx        <--- move 1
> >>    3:        f7 db                   neg    %ebx
> >>    5:        74 09                   je     10 <test+0x10>
> >>    7:        e8 00 00 00 00          call   c <test+0xc>
> >>    c:        89 d8                   mov    %ebx,%eax        <--- callee 
> >> restore
> >>    e:        5b                      pop    %rbx
> >>    f:        c3                      ret
> >>   10:        e8 00 00 00 00          call   15 <test+0x15>
> >>   15:        89 d8                   mov    %ebx,%eax        <--- move 2
> >>   17:        5b                      pop    %rbx             <--- callee 
> >> restore
> >>   18:        c3                      ret
> >>
> >> Mainline used EAX since it has costs 13000.  It is not 100% clear to me
> >> why.
> >>  - 12000 is the spilling (which is emitted twice but executed just once)
> >>  - I would have expected 2000 for the move from edi to eax.
> >> However even if cost is 14000 we will choose EAX.  The code is:
> >>
> >>    0:        89 f8                   mov    %edi,%eax        <--- move1
> >>    2:        48 83 ec 18             sub    $0x18,%rsp       <--- stack 
> >> frame creation
> >>    6:        f7 d8                   neg    %eax
> >>    8:        89 44 24 0c             mov    %eax,0xc(%rsp)   <--- spill out
> >>    c:        85 ff                   test   %edi,%edi
> >>    e:        74 10                   je     20 <test+0x20>
> >>   10:        e8 00 00 00 00          call   15 <test+0x15>
> >>   15:        8b 44 24 0c             mov    0xc(%rsp),%eax   <--- spill in
> >>   19:        48 83 c4 18             add    $0x18,%rsp       <--- stack 
> >> frame
> >>   1d:        c3                      ret
> >>   1e:        66 90                   xchg   %ax,%ax
> >>   20:        e8 00 00 00 00          call   25 <test+0x25>
> >>   25:        8b 44 24 0c             mov    0xc(%rsp),%eax   <--- spill in
> >>   29:        48 83 c4 18             add    $0x18,%rsp       <--- stack 
> >> frame
> >>   2d:        c3                      ret
> >>
> >> This sequence really saves one move at the expense of of stack frame
> >> allocation (which is not modelled by the cost model) and longer spill
> >> code (also no modelled).
> >> [...]
> >> PR117082 is about noreturn function:
> >> __attribute__ ((noreturn))
> >> void
> >> f3 (void)
> >> {
> >>   int y0 = x0;
> >>   int y1 = x1;
> >>   f1 ();
> >>   f2 (y0, y1);
> >>   while (1);
> >> }
> >>
> >> Here the cost model is really wrong by assuming that entry and exit
> >> block have same frequencies.  This can be fixed quite easilly (though it
> >> is a rare case)
> >
> > Yeah (and nice example).
> >
> >> PR118497  seems to be ixed.
> >>
> >> So overall I think
> >>  1) we can fix scaling of epilogue by exit block frequency
> >>     to get noreturns right.
> >>  2) we should drop the check for optimize_size.  Since with -Os
> >>     REG_FREQ_FROM_BB always returns 1000 everything should be scaled
> >>     same way
> >>  3) we currently have wire in "-1" to biass the cost metric for callee
> >>     saved registers.
> >>     It may make sense to allow targets to control this, since i.e. x86
> >>     has push/pop that is shorter. -3 would solve the testcase with neg
> >>     and would express that push/pop is still cheaper with extra reg-reg
> >>     move.
> >>  4) cost model misses shring wrapping, the fact that if register is
> >>     callee saved it may be used by multiple allocnos and also that
> >>     push/pop sequence may avoid need for manual RSP adjustments.
> >>
> >>     Those seems bit harder things to fit in though.
> >>
> >> So if we want to go with the target hook, I think it should adjust the
> >> cost before scalling (since targets may have special tricks for
> >> prologues) rather than the scale factor (which is target independent
> >> part of cost model).
> >
> > Like you say, one of the missing pieces appears to be the allocation/
> > dealloaction overhead for callee saves.  Could we try to add a hook to
> > model that cost, based on certain parameters?
> >
> > In particular, one thing that the examples above have in common is that
> > they don't need to allocate a frame for local variables.  That seems
> > like it ought to be part of the mix.  If we need to allocate a frame
> > using addition anyway, then presumably one of the advantages of push/pop
> > over callee saves goes away.
> >
> > But going back to my question above, modelling the allocation and
> > deallocation would need to be done in a way that makes them more
> > expensive than moves.  I think in some cases we've assumed that
> > addition and moves have equal cost, even when optimising for speed.
> >
> > In other words, rather than add a hook to tweak the -1 bias (which I
> > still agree is a more reasonable thing to do than bypassing the IRA
> > code altogether), could we add a separate hook for the allocation/
> > deallocation and leave IRA to work out when to apply it?
> >
> > I suppose for -fno-omit-frame-pointer we should also take the creation
> > of the frame link into account, if we don't already.  This matters on
> > aarch64 since -fomit-frame-pointer is not the default at -O2.
> >
> > One quirk on aarch64 is that, if we're using an odd number of
> > callee-saved GPRs, adding one more is essentially free, since we would
> > allocate space for it anyway, and could save and restore it alongside
> > the odd one out.  That would probably be difficult to apply in practice
> > though.  And it's obviously a separate issue from the current one.
> > Just mentioning it as another thing we could model in future.
>
> FWIW, here's a very rough initial version of the kind of thing
> I was thinking about.  Hopefully the hook documentation describes
> the approach.  It's deliberately (overly?) flexible.
>
> I've included an aarch64 version that (a) models the fact that the
> first caller-save can also allocate the frame more-or-less for free,
> and (b) once we've saved an odd number of GPRs, saving one more is
> essentialy free.  I also hacked up an x86 version locally to model
> the allocation benefits of using caller-saved registers.  It seemed
> to fix the povray example above.


Note the pair allocation for aarch64 was filed PR 117477 so it would
be useful to check if it fixes that and include the bug # in the
commit if it does.

Thanks,
Andrew


>
> This still needs a lot of clean-up and testing, but I thought I might
> as well send what I have before leaving for the weekend.  Does it look
> reasonable in principle?
>
> Thanks,
> Richard
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index c1e40200806..3f9453a74fb 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -15870,6 +15870,60 @@ aarch64_memory_move_cost (machine_mode mode, 
> reg_class_t rclass_i, bool in)
>           : base + aarch64_tune_params.memmov_cost.store_int);
>  }
>
> +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the
> +   RA has already decided to use.  Return the total number of GPRs
> +   that need to be saved and restored, including the frame link registers.  
> */
> +static int
> +aarch64_count_gpr_saves (const HARD_REG_SET &callee_saved_regs)
> +{
> +  auto saved_gprs = callee_saved_regs & reg_class_contents[GENERAL_REGS];
> +  /* FIXME: popcount hack.  */
> +  auto nregs = popcount_hwi (saved_gprs.elts[0]);
> +
> +  if (aarch64_needs_frame_chain ())
> +    nregs += 2;
> +  else if (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))
> +    nregs += 1;
> +  return nregs;
> +}
> +
> +/* Implement TARGET_CALLEE_SAVE_COST.  */
> +static int
> +aarch64_callee_save_cost (spill_cost_type, unsigned int regno,
> +                         machine_mode, unsigned int nregs, int mem_cost,
> +                         const HARD_REG_SET &callee_saved_regs,
> +                         bool existing_spill_p)
> +{
> +  auto ngprs = aarch64_count_gpr_saves (callee_saved_regs);
> +
> +  /* FIXME: handle FP and Advanced SIMD registers.  */
> +
> +  /* If we've already committed to saving an odd number of GPRs, assume that
> +     saving one more will involve turning an STR into an STP.  */
> +  if (GP_REGNUM_P (regno) && nregs == 1 && (ngprs & 1))
> +    return 0;
> +
> +  /* If this would be the first register that we save, add the cost of
> +     allocating or deallocating the frame.  For GPR saves, the allocation
> +     and deallocation can be folded into the save and restore.  */
> +  if (!existing_spill_p
> +      && ngprs == 0
> +      && !GP_REGNUM_P (regno))
> +    mem_cost += 2;
> +
> +  return mem_cost;
> +}
> +
> +/* Implement TARGET_FRAME_ALLOCATION_COST.  */
> +static int
> +aarch64_frame_allocation_cost (frame_cost_type,
> +                              const HARD_REG_SET &callee_saved_regs)
> +{
> +  if (aarch64_count_gpr_saves (callee_saved_regs) == 0)
> +    return 2;
> +  return 0;
> +}
> +
>  /* Implement TARGET_INSN_COST.  We have the opportunity to do something
>     much more productive here, such as using insn attributes to cost things.
>     But we don't, not yet.
> @@ -31572,6 +31626,12 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_MEMORY_MOVE_COST
>  #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost
>
> +#undef TARGET_CALLEE_SAVE_COST
> +#define TARGET_CALLEE_SAVE_COST aarch64_callee_save_cost
> +
> +#undef TARGET_FRAME_ALLOCATION_COST
> +#define TARGET_FRAME_ALLOCATION_COST aarch64_frame_allocation_cost
> +
>  #undef TARGET_MIN_DIVISIONS_FOR_RECIP_MUL
>  #define TARGET_MIN_DIVISIONS_FOR_RECIP_MUL 
> aarch64_min_divisions_for_recip_mul
>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 0de24eda6f0..171bb8c0b77 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -7003,6 +7003,74 @@ value to the result of that function.  The arguments 
> to that function
>  are the same as to this target hook.
>  @end deftypefn
>
> +@deftypefn {Target Hook} int TARGET_CALLEE_SAVE_COST (spill_cost_type 
> @var{cost_type}, unsigned int @var{hard_regno}, machine_mode @var{mode}, 
> unsigned int @var{nregs}, int @var{mem_cost}, const HARD_REG_SET 
> @var{&allocated_callee_regs}, bool @var{existing_spills_p})
> +Return the one-off cost of saving or restoring callee-saved registers
> +(also known as call-preserved register or non-volatile registers).
> +The parameters are as follows:
> +
> +@itemize
> +@item
> +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register
> +and @samp{spill_cost_type::RESTORE} for restoring a register.
> +
> +@item
> +@var{hard_regno} and @var{mode} represent the whole register that
> +the register allocator is considering using; of these,
> +@var{nregs} registers are fully or partially callee-saved.
> +
> +@item
> +@var{mem_cost} is the normal cost for storing (for saves)
> +or loading (for restores) the @var{nregs} registers.
> +
> +@item
> +@var{allocated_callee_regs} is the set of callee-saved registers
> +that are already in use.
> +
> +@item
> +@var{existing_spills_p} is true if the register allocator has
> +already decided to spill registers to memory.
> +@end itemize
> +
> +If @var{existing_spills_p} is false, the cost of a save should account
> +for frame allocations in a way that is consistent with
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for
> +caller-saved registers.  Similarly, the cost of a restore should then
> +account for frame deallocations in a way that is consistent with
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of deallocations.
> +
> +Note that this hook should not attempt to apply a frequency scale
> +to the cost: it is the caller's responsibility to do that where
> +appropriate.
> +
> +The default implementation returns @var{mem_cost}, plus the allocation
> +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},
> +where appropriate.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_FRAME_ALLOCATION_COST (frame_cost_type 
> @var{cost_type}, const HARD_REG_SET @var{&allocated_callee_regs})
> +Return the cost of allocating or deallocating a frame for the sake of
> +using caller-saved registers; @var{cost_type} chooses between allocation
> +and deallocation.
> +
> +This hook is only called if the register allocator has not so far
> +decided to spill any pseudo registers to memory or to use caller-saved
> +registers for values that are live across a call.  It may have
> +decided to use callee-saved registers; if so, @var{allocated_callee_regs}
> +is the set of callee-saved registers it has used.  There might also be
> +other reasons why a stack frame is already needed; for example,
> +@samp{get_frame_size ()} may be nonzero, or the target might already
> +require a frame for target-specific reasons.
> +
> +When the register allocator uses this hook to cost caller-saved registers,
> +it also uses @code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved
> +registers, passing @samp{false} as the @var{existing_spills_p} argument.
> +The intention is to allow the target to apply an apples-for-apples
> +comparison between callee-saved and caller-saved registers when the
> +allocator has not yet committed to using registers of both types.
> +
> +The default implementation returns 0.
> +@end deftypefn
> +
>  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
>  A C expression for the cost of a branch instruction.  A value of 1 is
>  the default; other values are interpreted relative to that. Parameter
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 631d04131e3..eccc4d88493 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4582,6 +4582,10 @@ These macros are obsolete, new ports should use the 
> target hook
>
>  @hook TARGET_MEMORY_MOVE_COST
>
> +@hook TARGET_CALLEE_SAVE_COST
> +
> +@hook TARGET_FRAME_ALLOCATION_COST
> +
>  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
>  A C expression for the cost of a branch instruction.  A value of 1 is
>  the default; other values are interpreted relative to that. Parameter
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 0699b349a1a..9aaac794f35 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1199,6 +1199,9 @@ finish_update_cost_records (void)
>     register was already allocated for an allocno.  */
>  static bool allocated_hardreg_p[FIRST_PSEUDO_REGISTER];
>
> +/* Which callee-saved hard registers we've decided to save.  */
> +static HARD_REG_SET allocated_callee_save_regs;
> +
>  /* Describes one element in a queue of allocnos whose costs need to be
>     updated.  Each allocno in the queue is known to have an allocno
>     class.  */
> @@ -1740,6 +1743,20 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
>    return j == nregs;
>  }
>
> +/* Record that we have allocated NREGS registers starting at HARD_REGNO.  */
> +
> +static void
> +record_allocation (int hard_regno, int nregs)
> +{
> +  for (int i = 0; i < nregs; ++i)
> +    if (!allocated_hardreg_p[hard_regno + i])
> +      {
> +       allocated_hardreg_p[hard_regno + i] = true;
> +       if (!crtl->abi->clobbers_full_reg_p (hard_regno + i))
> +         SET_HARD_REG_BIT (allocated_callee_save_regs, hard_regno + i);
> +      }
> +}
> +
>  /* Return number of registers needed to be saved and restored at
>     function prologue/epilogue if we allocate HARD_REGNO to hold value
>     of MODE.  */
> @@ -1961,6 +1978,14 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>  #endif
>    auto_bitmap allocnos_to_spill;
>    HARD_REG_SET soft_conflict_regs = {};
> +  int entry_freq = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  int exit_freq = REG_FREQ_FROM_BB (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +  /* Whether we have spilled pseudos or used caller-saved registers
> +     for values that are live across a call.
> +
> +     FIXME: also check for pseudos that prefer NO_REGS and not equivalences.
> +     Also account for regular spills.  */
> +  bool existing_spills_p = caller_save_needed;
>
>    ira_assert (! ALLOCNO_ASSIGNED_P (a));
>    get_conflict_and_start_profitable_regs (a, retry_p,
> @@ -2175,17 +2200,45 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>           /* We need to save/restore the hard register in
>              epilogue/prologue.  Therefore we increase the cost.  */
>           {
> +           int nregs = hard_regno_nregs (hard_regno, mode);
> +           add_cost = 0;
>             rclass = REGNO_REG_CLASS (hard_regno);
> -           add_cost = ((ira_memory_move_cost[mode][rclass][0]
> -                        + ira_memory_move_cost[mode][rclass][1])
> -                       * saved_nregs / hard_regno_nregs (hard_regno,
> -                                                         mode) - 1)
> -                      * (optimize_size ? 1 :
> -                         REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +
> +           auto entry_cost = targetm.callee_save_cost
> +             (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> +              ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> +              allocated_callee_save_regs, existing_spills_p);
> +           add_cost += entry_cost * entry_freq;
> +
> +           auto exit_cost = targetm.callee_save_cost
> +             (spill_cost_type::RESTORE, hard_regno, mode, saved_nregs,
> +              ira_memory_move_cost[mode][rclass][1] * saved_nregs / nregs,
> +              allocated_callee_save_regs, existing_spills_p);
> +           add_cost += exit_cost * exit_freq;
> +
> +           /* In the event of a tie between caller-save and callee-save,
> +              prefer callee-save.  */
> +           add_cost -= 1;
> +
>             cost += add_cost;
>             full_cost += add_cost;
>           }
>         }
> +      if (!existing_spills_p && ira_need_caller_save_p (a, hard_regno))
> +       {
> +         add_cost = 0;
> +
> +         auto entry_cost = targetm.frame_allocation_cost
> +           (frame_cost_type::ALLOCATION, allocated_callee_save_regs);
> +         add_cost += entry_cost * entry_freq;
> +
> +         auto exit_cost = targetm.frame_allocation_cost
> +           (frame_cost_type::DEALLOCATION, allocated_callee_save_regs);
> +         add_cost += exit_cost * exit_freq;
> +
> +         cost += add_cost;
> +         full_cost += add_cost;
> +       }
>        if (min_cost > cost)
>         min_cost = cost;
>        if (min_full_cost > full_cost)
> @@ -2212,8 +2265,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
>   fail:
>    if (best_hard_regno >= 0)
>      {
> -      for (i = hard_regno_nregs (best_hard_regno, mode) - 1; i >= 0; i--)
> -       allocated_hardreg_p[best_hard_regno + i] = true;
> +      record_allocation (best_hard_regno,
> +                        hard_regno_nregs (best_hard_regno, mode));
>        spill_soft_conflicts (a, allocnos_to_spill, soft_conflict_regs,
>                             best_hard_regno);
>      }
> @@ -3369,8 +3422,7 @@ improve_allocation (void)
>        /* Assign the best chosen hard register to A.  */
>        ALLOCNO_HARD_REGNO (a) = best;
>
> -      for (j = nregs - 1; j >= 0; j--)
> -       allocated_hardreg_p[best + j] = true;
> +      record_allocation (best, nregs);
>
>        if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
>         fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
> diff --git a/gcc/target.def b/gcc/target.def
> index 4050b2ebdd4..fdd80c06c77 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -3775,6 +3775,80 @@ are the same as to this target hook.",
>   int, (machine_mode mode, reg_class_t rclass, bool in),
>   default_memory_move_cost)
>
> +DEFHOOK
> +(callee_save_cost,
> + "Return the one-off cost of saving or restoring callee-saved registers\n\
> +(also known as call-preserved register or non-volatile registers).\n\
> +The parameters are as follows:\n\
> +\n\
> +@itemize\n\
> +@item\n\
> +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register\n\
> +and @samp{spill_cost_type::RESTORE} for restoring a register.\n\
> +\n\
> +@item\n\
> +@var{hard_regno} and @var{mode} represent the whole register that\n\
> +the register allocator is considering using; of these,\n\
> +@var{nregs} registers are fully or partially callee-saved.\n\
> +\n\
> +@item\n\
> +@var{mem_cost} is the normal cost for storing (for saves)\n\
> +or loading (for restores) the @var{nregs} registers.\n\
> +\n\
> +@item\n\
> +@var{allocated_callee_regs} is the set of callee-saved registers\n\
> +that are already in use.\n\
> +\n\
> +@item\n\
> +@var{existing_spills_p} is true if the register allocator has\n\
> +already decided to spill registers to memory.\n\
> +@end itemize\n\
> +\n\
> +If @var{existing_spills_p} is false, the cost of a save should account\n\
> +for frame allocations in a way that is consistent with\n\
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for\n\
> +caller-saved registers.  Similarly, the cost of a restore should then\n\
> +account for frame deallocations in a way that is consistent with\n\
> +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of deallocations.\n\
> +\n\
> +Note that this hook should not attempt to apply a frequency scale\n\
> +to the cost: it is the caller's responsibility to do that where\n\
> +appropriate.\n\
> +\n\
> +The default implementation returns @var{mem_cost}, plus the allocation\n\
> +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},\n\
> +where appropriate.",
> + int, (spill_cost_type cost_type, unsigned int hard_regno,
> +       machine_mode mode, unsigned int nregs, int mem_cost,
> +       const HARD_REG_SET &allocated_callee_regs, bool existing_spills_p),
> + default_callee_save_cost)
> +
> +DEFHOOK
> +(frame_allocation_cost,
> + "Return the cost of allocating or deallocating a frame for the sake of\n\
> +using caller-saved registers; @var{cost_type} chooses between allocation\n\
> +and deallocation.\n\
> +\n\
> +This hook is only called if the register allocator has not so far\n\
> +decided to spill any pseudo registers to memory or to use caller-saved\n\
> +registers for values that are live across a call.  It may have\n\
> +decided to use callee-saved registers; if so, @var{allocated_callee_regs}\n\
> +is the set of callee-saved registers it has used.  There might also be\n\
> +other reasons why a stack frame is already needed; for example,\n\
> +@samp{get_frame_size ()} may be nonzero, or the target might already\n\
> +require a frame for target-specific reasons.\n\
> +\n\
> +When the register allocator uses this hook to cost caller-saved registers,\n\
> +it also uses @code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved\n\
> +registers, passing @samp{false} as the @var{existing_spills_p} argument.\n\
> +The intention is to allow the target to apply an apples-for-apples\n\
> +comparison between callee-saved and caller-saved registers when the\n\
> +allocator has not yet committed to using registers of both types.\n\
> +\n\
> +The default implementation returns 0.",
> + int, (frame_cost_type cost_type, const HARD_REG_SET &allocated_callee_regs),
> + default_frame_allocation_cost)
> +
>  DEFHOOK
>  (use_by_pieces_infrastructure_p,
>   "GCC will attempt several strategies when asked to copy between\n\
> diff --git a/gcc/target.h b/gcc/target.h
> index 3e1ee68a341..2bf35e2d0ee 100644
> --- a/gcc/target.h
> +++ b/gcc/target.h
> @@ -284,6 +284,18 @@ enum poly_value_estimate_kind
>    POLY_VALUE_LIKELY
>  };
>
> +enum class spill_cost_type
> +{
> +  SAVE,
> +  RESTORE
> +};
> +
> +enum class frame_cost_type
> +{
> +  ALLOCATION,
> +  DEALLOCATION
> +};
> +
>  typedef void (*emit_support_tinfos_callback) (tree);
>
>  extern bool verify_type_context (location_t, type_context_kind, const_tree,
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index f80dc8b2e7e..a8331ba5576 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -2075,6 +2075,29 @@ default_register_move_cost (machine_mode mode 
> ATTRIBUTE_UNUSED,
>  #endif
>  }
>
> +int
> +default_callee_save_cost (spill_cost_type spill_type, unsigned int,
> +                         machine_mode, unsigned int, int mem_cost,
> +                         const HARD_REG_SET &callee_saved_regs,
> +                         bool existing_spills_p)
> +{
> +  if (!existing_spills_p)
> +    {
> +      auto frame_type = (spill_type == spill_cost_type::SAVE
> +                        ? frame_cost_type::ALLOCATION
> +                        : frame_cost_type::DEALLOCATION);
> +      mem_cost += targetm.frame_allocation_cost (frame_type,
> +                                                callee_saved_regs);
> +    }
> +  return mem_cost;
> +}
> +
> +int
> +default_frame_allocation_cost (frame_cost_type, const HARD_REG_SET &)
> +{
> +  return 0;
> +}
> +
>  /* The default implementation of TARGET_SLOW_UNALIGNED_ACCESS.  */
>
>  bool
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 7d15f632c23..0c4c27e257e 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -234,6 +234,11 @@ extern tree default_builtin_tm_load_store (tree);
>  extern int default_memory_move_cost (machine_mode, reg_class_t, bool);
>  extern int default_register_move_cost (machine_mode, reg_class_t,
>                                        reg_class_t);
> +extern int default_callee_save_cost (spill_cost_type, unsigned int,
> +                                    machine_mode, unsigned int, int,
> +                                    const HARD_REG_SET &, bool);
> +extern int default_frame_allocation_cost (frame_cost_type,
> +                                         const HARD_REG_SET &);
>  extern bool default_slow_unaligned_access (machine_mode, unsigned int);
>  extern HOST_WIDE_INT default_estimated_poly_value (poly_int64,
>                                                    poly_value_estimate_kind);
> --
> 2.25.1
>

Re: [PATCH v2] ira: Add a target hook for callee-saved register cost scale

Reply via email to