On Tue, Mar 4, 2025 at 6:31 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Tue, Mar 4, 2025 at 11:18 AM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
> >
> > Richard Sandiford <richard.sandif...@arm.com> writes:
> > > Jan Hubicka <hubi...@ucw.cz> writes:
> > >>>
> > >>> Thanks for running these.  I saw poor results for perlbench with my
> > >>> initial aarch64 hooks because the hooks reduced the cost to zero for
> > >>> the entry case:
> > >>>
> > >>>         auto entry_cost = targetm.callee_save_cost
> > >>>           (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> > >>>            ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> > >>>            allocated_callee_save_regs, existing_spills_p);
> > >>>         /* In the event of a tie between caller-save and callee-save,
> > >>>            prefer callee-save.  We apply this to the entry cost rather
> > >>>            than the exit cost since the entry frequency must be at
> > >>>            least as high as the exit frequency.  */
> > >>>         if (entry_cost > 0)
> > >>>           entry_cost -= 1;
> > >>>
> > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was
> > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 
> > >>> 1",
> > >>> so that the cost is always greater than not using a callee save for
> > >>> registers that don't cross a call.  WDYT?
> > >>
> > >> For x86 perfomance costs, the push cost should be memory_move_cost which
> > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> > >> should not be 0 I think.
> > >>
> > >> For size cost, I currently return 1, so we indeed get 0 after
> > >> adjustment.
> > >>
> > >> I think cost of 0 will make us to pick callee save even if caller save
> > >> is available and there are no function calls, so I guess we do not want
> > >> that....
> > >
> > > OK, here's an updated patch that makes that change.  The x86 parts
> > > should be replaced by your patch.
> > >
> > > Tested on aarch64-linux-gnu.  I also tried to test on 
> > > pwoerpc64el-linux-gnu
> > > (on gcc112), but I keep getting broken pipes during the test runs,
> > > so I'm struggling to get good before/after comparisons.  It does at
> > > least bootstrap though...
> >
> > Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
> > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
> > results regressed:
> >
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof 
> > (a) == 17 * sizeof (short)
> >
> > but the same test already failed for -O2 and -O3.
> >
> > OK to install now?  Or, given the lateness in the release cycle,
> > would it be better to wait for GCC 16?
>
> I think it's OK to install now.  Not installing anything isn't an option, the
> alternative would be to at least revert HJs change.
I'm hoping to install this patch in GCC15.
>
> Thanks,
> Richard.
>
> >
> > Thanks,
> > Richard
> >
> >
> > Following on from the discussion in:
> >
> >   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
> >
> > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
> > replaces it with two hooks: one that controls the cost of using an
> > extra callee-saved register and one that controls the cost of allocating
> > a frame for the first spill.
> >
> > (The patch does not attempt to address the shrink-wrapping part of
> > the thread above.)
> >
> > On AArch64, this is enough to fix PR117477, as verified by the new tests.
> > The patch does not change the SPEC2017 scores significantly.  (I saw a
> > slight improvement in fotonik3d and roms, but I'm not convinced that
> > the improvements are real.)
> >
> > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
> > which is a scan-dump correctness test that relies on not using
> > caller saves.  The decision to use caller saves looks appropriate,
> > and saves an instruction, so I've just added -fno-caller-saves
> > to the test options.
> >
> > The x86 parts were written by Honza.
> >
> > gcc/
> >         PR rtl-optimization/117477
> >         * config/aarch64/aarch64.cc (aarch64_count_saves): New function.
> >         (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
> >         (aarch64_frame_allocation_cost): Likewise.
> >         (TARGET_CALLEE_SAVE_COST): Define.
> >         (TARGET_FRAME_ALLOCATION_COST): Likewise.
> >         * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> >         Replace with...
> >         (ix86_callee_save_cost): ...this new hook.
> >         (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
> >         (TARGET_CALLEE_SAVE_COST): Define.
> >         * target.h (spill_cost_type, frame_cost_type): New enums.
> >         * target.def (callee_save_cost, frame_allocation_cost): New hooks.
> >         (ira_callee_saved_register_cost_scale): Delete.
> >         * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): 
> > Delete.
> >         (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
> >         * doc/tm.texi: Regenerate.
> >         * hard-reg-set.h (hard_reg_set_popcount): New function.
> >         * ira-color.cc (allocated_memory_p): New variable.
> >         (allocated_callee_save_regs): Likewise.
> >         (record_allocation): New function.
> >         (assign_hard_reg): Use targetm.frame_allocation_cost to model
> >         the cost of the first spill or first caller save.  Use
> >         targetm.callee_save_cost to model the cost of using new callee-saved
> >         registers.  Apply the exit rather than entry frequency to the cost
> >         of restoring a register or deallocating the frame.  Update the
> >         new variables above.
> >         (improve_allocation): Use record_allocation.
> >         (color): Initialize allocated_callee_save_regs.
> >         (ira_color): Initialize allocated_memory_p.
> >         * targhooks.h (default_callee_save_cost): Declare.
> >         (default_frame_allocation_cost): Likewise.
> >         * targhooks.cc (default_callee_save_cost): New function.
> >         (default_frame_allocation_cost): Likewise.
> >
> > gcc/testsuite/
> >         PR rtl-optimization/117477
> >         * gcc.target/aarch64/callee_save_1.c: New test.
> >         * gcc.target/aarch64/callee_save_2.c: Likewise.
> >         * gcc.target/aarch64/callee_save_3.c: Likewise.
> >         * gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves.
> >
> > Co-authored-by: Jan Hubicka <hubi...@ucw.cz>
> > ---
> >  gcc/config/aarch64/aarch64.cc                 | 118 ++++++++++++++++++
> >  gcc/config/i386/i386.cc                       |  28 +++--
> >  gcc/doc/tm.texi                               |  77 ++++++++++--
> >  gcc/doc/tm.texi.in                            |   6 +-
> >  gcc/hard-reg-set.h                            |  15 +++
> >  gcc/ira-color.cc                              |  83 ++++++++++--
> >  gcc/target.def                                |  87 +++++++++++--
> >  gcc/target.h                                  |  12 ++
> >  gcc/targhooks.cc                              |  27 ++++
> >  gcc/targhooks.h                               |   5 +
> >  .../gcc.target/aarch64/callee_save_1.c        |  12 ++
> >  .../gcc.target/aarch64/callee_save_2.c        |  14 +++
> >  .../gcc.target/aarch64/callee_save_3.c        |  12 ++
> >  gcc/testsuite/gcc.target/aarch64/pr103350-1.c |   2 +-
> >  14 files changed, 459 insertions(+), 39 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_1.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/callee_save_3.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index fe76730b0a7..27ea82cd7da 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -15873,6 +15873,118 @@ aarch64_memory_move_cost (machine_mode mode, 
> > reg_class_t rclass_i, bool in)
> >           : base + aarch64_tune_params.memmov_cost.store_int);
> >  }
> >
> > +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the
> > +   RA has already decided to use.  Return the total number of registers
> > +   in class RCLASS that need to be saved and restored, including the
> > +   frame link registers.  */
> > +static int
> > +aarch64_count_saves (const HARD_REG_SET &callee_saved_regs, reg_class 
> > rclass)
> > +{
> > +  auto saved_gprs = callee_saved_regs & reg_class_contents[rclass];
> > +  auto nregs = hard_reg_set_popcount (saved_gprs);
> > +
> > +  if (TEST_HARD_REG_BIT (reg_class_contents[rclass], LR_REGNUM))
> > +    {
> > +      if (aarch64_needs_frame_chain ())
> > +       nregs += 2;
> > +      else if (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))
> > +       nregs += 1;
> > +    }
> > +  return nregs;
> > +}
> > +
> > +/* CALLEE_SAVED_REGS is the set of callee-saved registers that the
> > +   RA has already decided to use.  Return the total number of registers
> > +   that need to be saved above the hard frame pointer, including the
> > +   frame link registers.  */
> > +static int
> > +aarch64_count_above_hard_fp_saves (const HARD_REG_SET &callee_saved_regs)
> > +{
> > +  /* FP and Advanced SIMD registers are saved above the frame pointer
> > +     but SVE registers are saved below it.  */
> > +  if (known_le (GET_MODE_SIZE (aarch64_reg_save_mode (V8_REGNUM)), 16U))
> > +    return aarch64_count_saves (callee_saved_regs, POINTER_AND_FP_REGS);
> > +  return aarch64_count_saves (callee_saved_regs, POINTER_REGS);
> > +}
> > +
> > +/* Implement TARGET_CALLEE_SAVE_COST.  */
> > +static int
> > +aarch64_callee_save_cost (spill_cost_type spill_type, unsigned int regno,
> > +                         machine_mode mode, unsigned int nregs, int 
> > mem_cost,
> > +                         const HARD_REG_SET &callee_saved_regs,
> > +                         bool existing_spill_p)
> > +{
> > +  /* If we've already committed to saving an odd number of GPRs, assume 
> > that
> > +     saving one more will involve turning an STR into an STP and an LDR
> > +     into an LDP.  This should still be more expensive than not spilling
> > +     (meaning that the minimum cost is 1), but it should usually be cheaper
> > +     than a separate store or load.  */
> > +  if (GP_REGNUM_P (regno)
> > +      && nregs == 1
> > +      && (aarch64_count_saves (callee_saved_regs, GENERAL_REGS) & 1))
> > +    return 1;
> > +
> > +  /* Similarly for saving FP registers, if we only need to save the low
> > +     64 bits.  (We can also use STP/LDP instead of STR/LDR for Q registers,
> > +     but that is less likely to be a saving.)  */
> > +  if (FP_REGNUM_P (regno)
> > +      && nregs == 1
> > +      && known_eq (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), 8U)
> > +      && (aarch64_count_saves (callee_saved_regs, FP_REGS) & 1))
> > +    return 1;
> > +
> > +  /* If this would be the first register that we save, add the cost of
> > +     allocating or deallocating the frame.  For GPR, FPR, and Advanced SIMD
> > +     saves, the allocation and deallocation can be folded into the save and
> > +     restore.  */
> > +  if (!existing_spill_p
> > +      && !GP_REGNUM_P (regno)
> > +      && !(FP_REGNUM_P (regno)
> > +          && known_le (GET_MODE_SIZE (aarch64_reg_save_mode (regno)), 
> > 16U)))
> > +    return default_callee_save_cost (spill_type, regno, mode, nregs, 
> > mem_cost,
> > +                                    callee_saved_regs, existing_spill_p);
> > +
> > +  return mem_cost;
> > +}
> > +
> > +/* Implement TARGET_FRAME_ALLOCATION_COST.  */
> > +static int
> > +aarch64_frame_allocation_cost (frame_cost_type,
> > +                              const HARD_REG_SET &callee_saved_regs)
> > +{
> > +  /* The intention is to model the relative costs of different approaches
> > +     to storing data on the stack, rather than to model the cost of saving
> > +     data vs not saving it.  This means that we should return 0 if:
> > +
> > +     - any frame is going to be allocated with:
> > +
> > +          stp x29, x30, [sp, #-...]!
> > +
> > +       to create a frame link.
> > +
> > +     - any frame is going to be allocated with:
> > +
> > +          str x30, [sp, #-...]!
> > +
> > +       to save the link register.
> > +
> > +     In both cases, the allocation and deallocation instructions are the
> > +     same however we store data to the stack.  (In the second case, the STR
> > +     could be converted to an STP by saving an extra call-preserved 
> > register,
> > +     but that is modeled by aarch64_callee_save_cost.)
> > +
> > +     In other cases, assume that a frame would need to be allocated with a
> > +     separate subtraction and deallocated with a separate addition.  Saves
> > +     of call-clobbered registers can then reclaim this cost using a
> > +     predecrement store and a postincrement load.
> > +
> > +     For simplicity, give this addition or subtraction the same cost as
> > +     a GPR move.  We could parameterize this if necessary.  */
> > +  if (aarch64_count_above_hard_fp_saves (callee_saved_regs) == 0)
> > +    return aarch64_tune_params.regmove_cost->GP2GP;
> > +  return 0;
> > +}
> > +
> >  /* Implement TARGET_INSN_COST.  We have the opportunity to do something
> >     much more productive here, such as using insn attributes to cost things.
> >     But we don't, not yet.
> > @@ -31557,6 +31669,12 @@ aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_MEMORY_MOVE_COST
> >  #define TARGET_MEMORY_MOVE_COST aarch64_memory_move_cost
> >
> > +#undef TARGET_CALLEE_SAVE_COST
> > +#define TARGET_CALLEE_SAVE_COST aarch64_callee_save_cost
> > +
> > +#undef TARGET_FRAME_ALLOCATION_COST
> > +#define TARGET_FRAME_ALLOCATION_COST aarch64_frame_allocation_cost
> > +
> >  #undef TARGET_MIN_DIVISIONS_FOR_RECIP_MUL
> >  #define TARGET_MIN_DIVISIONS_FOR_RECIP_MUL 
> > aarch64_min_divisions_for_recip_mul
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index fb93a6fdd0a..661e71b032c 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -20600,12 +20600,27 @@ ix86_class_likely_spilled_p (reg_class_t rclass)
> >    return false;
> >  }
> >
> > -/* Implement TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE.  */
> > +/* Implement TARGET_CALLEE_SAVE_COST.  */
> >
> >  static int
> > -ix86_ira_callee_saved_register_cost_scale (int)
> > -{
> > -  return 1;
> > +ix86_callee_save_cost (spill_cost_type, unsigned int hard_regno, 
> > machine_mode,
> > +                      unsigned int, int mem_cost, const HARD_REG_SET &, 
> > bool)
> > +{
> > +  /* Account for the fact that push and pop are shorter and do their
> > +     own allocation and deallocation.  */
> > +  if (GENERAL_REGNO_P (hard_regno))
> > +    {
> > +      /* push is 1 byte while typical spill is 4-5 bytes.
> > +        ??? We probably should adjust size costs accordingly.
> > +        Costs are relative to reg-reg move that has 2 bytes for 32bit
> > +        and 3 bytes otherwise.  */
> > +      if (optimize_function_for_size_p (cfun))
> > +       return 1;
> > +      /* Be sure that no cost table sets cost to 2, so we end up with 0.  
> > */
> > +      gcc_checking_assert (mem_cost > 2);
> > +      return mem_cost - 2;
> > +    }
> > +  return mem_cost;
> >  }
> >
> >  /* Return true if a set of DST by the expression SRC should be allowed.
> > @@ -27092,9 +27107,8 @@ ix86_libgcc_floating_mode_supported_p
> >  #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P hook_bool_mode_true
> >  #undef TARGET_CLASS_LIKELY_SPILLED_P
> >  #define TARGET_CLASS_LIKELY_SPILLED_P ix86_class_likely_spilled_p
> > -#undef TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> > -#define TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE \
> > -  ix86_ira_callee_saved_register_cost_scale
> > +#undef TARGET_CALLEE_SAVE_COST
> > +#define TARGET_CALLEE_SAVE_COST ix86_callee_save_cost
> >
> >  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
> >  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 9f42913a4ef..a96700c0d38 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -3047,14 +3047,6 @@ A target hook which can change allocno class for 
> > given pseudo from
> >    The default version of this target hook always returns given class.
> >  @end deftypefn
> >
> > -@deftypefn {Target Hook} int TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE 
> > (int @var{hard_regno})
> > -A target hook which returns the callee-saved register @var{hard_regno}
> > -cost scale in epilogue and prologue used by IRA.
> > -
> > -The default version of this target hook returns 1 if optimizing for
> > -size, otherwise returns the entry block frequency.
> > -@end deftypefn
> > -
> >  @deftypefn {Target Hook} bool TARGET_LRA_P (void)
> >  A target hook which returns true if we use LRA instead of reload pass.
> >
> > @@ -7011,6 +7003,75 @@ value to the result of that function.  The arguments 
> > to that function
> >  are the same as to this target hook.
> >  @end deftypefn
> >
> > +@deftypefn {Target Hook} int TARGET_CALLEE_SAVE_COST (spill_cost_type 
> > @var{cost_type}, unsigned int @var{hard_regno}, machine_mode @var{mode}, 
> > unsigned int @var{nregs}, int @var{mem_cost}, const HARD_REG_SET 
> > @var{&allocated_callee_regs}, bool @var{existing_spills_p})
> > +Return the one-off cost of saving or restoring callee-saved registers
> > +(also known as call-preserved registers or non-volatile registers).
> > +The parameters are as follows:
> > +
> > +@itemize
> > +@item
> > +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register
> > +and @samp{spill_cost_type::RESTORE} for restoring a register.
> > +
> > +@item
> > +@var{hard_regno} and @var{mode} represent the whole register that
> > +the register allocator is considering using; of these,
> > +@var{nregs} registers are fully or partially callee-saved.
> > +
> > +@item
> > +@var{mem_cost} is the normal cost for storing (for saves)
> > +or loading (for restores) the @var{nregs} registers.
> > +
> > +@item
> > +@var{allocated_callee_regs} is the set of callee-saved registers
> > +that are already in use.
> > +
> > +@item
> > +@var{existing_spills_p} is true if the register allocator has
> > +already decided to spill registers to memory.
> > +@end itemize
> > +
> > +If @var{existing_spills_p} is false, the cost of a save should account
> > +for frame allocations in a way that is consistent with
> > +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for spills.
> > +Similarly, the cost of a restore should then account for frame 
> > deallocations
> > +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s
> > +handling of deallocations.
> > +
> > +Note that this hook should not attempt to apply a frequency scale
> > +to the cost: it is the caller's responsibility to do that where
> > +appropriate.
> > +
> > +The default implementation returns @var{mem_cost}, plus the allocation
> > +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},
> > +where appropriate.
> > +@end deftypefn
> > +
> > +@deftypefn {Target Hook} int TARGET_FRAME_ALLOCATION_COST (frame_cost_type 
> > @var{cost_type}, const HARD_REG_SET @var{&allocated_callee_regs})
> > +Return the cost of allocating or deallocating a frame for the sake of
> > +a spill; @var{cost_type} chooses between allocation and deallocation.
> > +The term ``spill'' here includes both forcing a pseudo register to memory
> > +and using caller-saved registers for pseudo registers that are live across
> > +a call.
> > +
> > +This hook is only called if the register allocator has not so far
> > +decided to spill.  The allocator may have decided to use callee-saved
> > +registers; if so, @var{allocated_callee_regs} is the set of callee-saved
> > +registers that the allocator has used.  There might also be other reasons
> > +why a stack frame is already needed; for example, @samp{get_frame_size ()}
> > +might be nonzero, or the target might already require a frame for
> > +target-specific reasons.
> > +
> > +When the register allocator uses this hook to cost spills, it also uses
> > +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, passing
> > +@samp{false} as the @var{existing_spills_p} argument.  The intention is to
> > +allow the target to apply an apples-for-apples comparison between the
> > +cost of using callee-saved registers and using spills in cases where the
> > +allocator has not yet committed to using both strategies.
> > +
> > +The default implementation returns 0.
> > +@end deftypefn
> > +
> >  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
> >  A C expression for the cost of a branch instruction.  A value of 1 is
> >  the default; other values are interpreted relative to that. Parameter
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 6dbe22581ca..eccc4d88493 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -2388,8 +2388,6 @@ in the reload pass.
> >
> >  @hook TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS
> >
> > -@hook TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE
> > -
> >  @hook TARGET_LRA_P
> >
> >  @hook TARGET_REGISTER_PRIORITY
> > @@ -4584,6 +4582,10 @@ These macros are obsolete, new ports should use the 
> > target hook
> >
> >  @hook TARGET_MEMORY_MOVE_COST
> >
> > +@hook TARGET_CALLEE_SAVE_COST
> > +
> > +@hook TARGET_FRAME_ALLOCATION_COST
> > +
> >  @defmac BRANCH_COST (@var{speed_p}, @var{predictable_p})
> >  A C expression for the cost of a branch instruction.  A value of 1 is
> >  the default; other values are interpreted relative to that. Parameter
> > diff --git a/gcc/hard-reg-set.h b/gcc/hard-reg-set.h
> > index 48025d202b6..0d03aed5128 100644
> > --- a/gcc/hard-reg-set.h
> > +++ b/gcc/hard-reg-set.h
> > @@ -191,6 +191,12 @@ hard_reg_set_empty_p (const_hard_reg_set x)
> >    return x == HARD_CONST (0);
> >  }
> >
> > +inline int
> > +hard_reg_set_popcount (const_hard_reg_set x)
> > +{
> > +  return popcount_hwi (x);
> > +}
> > +
> >  #else
> >
> >  inline void
> > @@ -254,6 +260,15 @@ hard_reg_set_empty_p (const_hard_reg_set x)
> >      bad |= x.elts[i];
> >    return bad == 0;
> >  }
> > +
> > +inline int
> > +hard_reg_set_popcount (const_hard_reg_set x)
> > +{
> > +  int count = 0;
> > +  for (unsigned int i = 0; i < ARRAY_SIZE (x.elts); ++i)
> > +    count += popcount_hwi (x.elts[i]);
> > +  return count;
> > +}
> >  #endif
> >
> >  /* Iterator for hard register sets.  */
> > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> > index 233060e1587..4b9296029cc 100644
> > --- a/gcc/ira-color.cc
> > +++ b/gcc/ira-color.cc
> > @@ -1195,10 +1195,16 @@ finish_update_cost_records (void)
> >    update_cost_record_pool.release ();
> >  }
> >
> > +/* True if we have allocated memory, or intend to do so.  */
> > +static bool allocated_memory_p;
> > +
> >  /* Array whose element value is TRUE if the corresponding hard
> >     register was already allocated for an allocno.  */
> >  static bool allocated_hardreg_p[FIRST_PSEUDO_REGISTER];
> >
> > +/* Which callee-saved hard registers we've decided to save.  */
> > +static HARD_REG_SET allocated_callee_save_regs;
> > +
> >  /* Describes one element in a queue of allocnos whose costs need to be
> >     updated.  Each allocno in the queue is known to have an allocno
> >     class.  */
> > @@ -1740,6 +1746,20 @@ check_hard_reg_p (ira_allocno_t a, int hard_regno,
> >    return j == nregs;
> >  }
> >
> > +/* Record that we have allocated NREGS registers starting at HARD_REGNO.  
> > */
> > +
> > +static void
> > +record_allocation (int hard_regno, int nregs)
> > +{
> > +  for (int i = 0; i < nregs; ++i)
> > +    if (!allocated_hardreg_p[hard_regno + i])
> > +      {
> > +       allocated_hardreg_p[hard_regno + i] = true;
> > +       if (!crtl->abi->clobbers_full_reg_p (hard_regno + i))
> > +         SET_HARD_REG_BIT (allocated_callee_save_regs, hard_regno + i);
> > +      }
> > +}
> > +
> >  /* Return number of registers needed to be saved and restored at
> >     function prologue/epilogue if we allocate HARD_REGNO to hold value
> >     of MODE.  */
> > @@ -1961,6 +1981,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >  #endif
> >    auto_bitmap allocnos_to_spill;
> >    HARD_REG_SET soft_conflict_regs = {};
> > +  int entry_freq = REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > +  int exit_freq = REG_FREQ_FROM_BB (EXIT_BLOCK_PTR_FOR_FN (cfun));
> > +  int spill_cost = 0;
> > +  /* Whether we have spilled pseudos or used caller-saved registers for 
> > values
> > +     that are live across a call.  */
> > +  bool existing_spills_p = allocated_memory_p || caller_save_needed;
> >
> >    ira_assert (! ALLOCNO_ASSIGNED_P (a));
> >    get_conflict_and_start_profitable_regs (a, retry_p,
> > @@ -1979,6 +2005,18 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >      start_update_cost ();
> >    mem_cost += ALLOCNO_UPDATED_MEMORY_COST (a);
> >
> > +  if (!existing_spills_p)
> > +    {
> > +      auto entry_cost = targetm.frame_allocation_cost
> > +       (frame_cost_type::ALLOCATION, allocated_callee_save_regs);
> > +      spill_cost += entry_cost * entry_freq;
> > +
> > +      auto exit_cost = targetm.frame_allocation_cost
> > +       (frame_cost_type::DEALLOCATION, allocated_callee_save_regs);
> > +      spill_cost += exit_cost * exit_freq;
> > +    }
> > +  mem_cost += spill_cost;
> > +
> >    ira_allocate_and_copy_costs (&ALLOCNO_UPDATED_HARD_REG_COSTS (a),
> >                                aclass, ALLOCNO_HARD_REG_COSTS (a));
> >    a_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (a);
> > @@ -2175,16 +2213,37 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >           /* We need to save/restore the hard register in
> >              epilogue/prologue.  Therefore we increase the cost.  */
> >           {
> > +           int nregs = hard_regno_nregs (hard_regno, mode);
> > +           add_cost = 0;
> >             rclass = REGNO_REG_CLASS (hard_regno);
> > -           add_cost = ((ira_memory_move_cost[mode][rclass][0]
> > -                        + ira_memory_move_cost[mode][rclass][1])
> > -                       * saved_nregs / hard_regno_nregs (hard_regno,
> > -                                                         mode) - 1)
> > -                      * targetm.ira_callee_saved_register_cost_scale 
> > (hard_regno);
> > +
> > +           auto entry_cost = targetm.callee_save_cost
> > +             (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> > +              ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> > +              allocated_callee_save_regs, existing_spills_p);
> > +           /* In the event of a tie between caller-save and callee-save,
> > +              prefer callee-save.  We apply this to the entry cost rather
> > +              than the exit cost since the entry frequency must be at
> > +              least as high as the exit frequency.  */
> > +           if (entry_cost > 1)
> > +             entry_cost -= 1;
> > +           add_cost += entry_cost * entry_freq;
> > +
> > +           auto exit_cost = targetm.callee_save_cost
> > +             (spill_cost_type::RESTORE, hard_regno, mode, saved_nregs,
> > +              ira_memory_move_cost[mode][rclass][1] * saved_nregs / nregs,
> > +              allocated_callee_save_regs, existing_spills_p);
> > +           add_cost += exit_cost * exit_freq;
> > +
> >             cost += add_cost;
> >             full_cost += add_cost;
> >           }
> >         }
> > +      if (ira_need_caller_save_p (a, hard_regno))
> > +       {
> > +         cost += spill_cost;
> > +         full_cost += spill_cost;
> > +       }
> >        if (min_cost > cost)
> >         min_cost = cost;
> >        if (min_full_cost > full_cost)
> > @@ -2211,11 +2270,13 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> >   fail:
> >    if (best_hard_regno >= 0)
> >      {
> > -      for (i = hard_regno_nregs (best_hard_regno, mode) - 1; i >= 0; i--)
> > -       allocated_hardreg_p[best_hard_regno + i] = true;
> > +      record_allocation (best_hard_regno,
> > +                        hard_regno_nregs (best_hard_regno, mode));
> >        spill_soft_conflicts (a, allocnos_to_spill, soft_conflict_regs,
> >                             best_hard_regno);
> >      }
> > +  else
> > +    allocated_memory_p = true;
> >    if (! retry_p)
> >      restore_costs_from_copies (a);
> >    ALLOCNO_HARD_REGNO (a) = best_hard_regno;
> > @@ -3368,8 +3429,7 @@ improve_allocation (void)
> >        /* Assign the best chosen hard register to A.  */
> >        ALLOCNO_HARD_REGNO (a) = best;
> >
> > -      for (j = nregs - 1; j >= 0; j--)
> > -       allocated_hardreg_p[best + j] = true;
> > +      record_allocation (best, nregs);
> >
> >        if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
> >         fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
> > @@ -5199,6 +5259,7 @@ color (void)
> >  {
> >    allocno_stack_vec.create (ira_allocnos_num);
> >    memset (allocated_hardreg_p, 0, sizeof (allocated_hardreg_p));
> > +  CLEAR_HARD_REG_SET (allocated_callee_save_regs);
> >    ira_initiate_assign ();
> >    do_coloring ();
> >    ira_finish_assign ();
> > @@ -5327,10 +5388,14 @@ ira_color (void)
> >    ira_allocno_iterator ai;
> >
> >    /* Setup updated costs.  */
> > +  allocated_memory_p = false;
> >    FOR_EACH_ALLOCNO (a, ai)
> >      {
> >        ALLOCNO_UPDATED_MEMORY_COST (a) = ALLOCNO_MEMORY_COST (a);
> >        ALLOCNO_UPDATED_CLASS_COST (a) = ALLOCNO_CLASS_COST (a);
> > +      if (ALLOCNO_CLASS (a) == NO_REGS
> > +         && !ira_equiv_no_lvalue_p (ALLOCNO_REGNO (a)))
> > +       allocated_memory_p = true;
> >      }
> >    if (ira_conflicts_p)
> >      color ();
> > diff --git a/gcc/target.def b/gcc/target.def
> > index c348b15815a..6c7cdc8126b 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -3775,6 +3775,81 @@ are the same as to this target hook.",
> >   int, (machine_mode mode, reg_class_t rclass, bool in),
> >   default_memory_move_cost)
> >
> > +DEFHOOK
> > +(callee_save_cost,
> > + "Return the one-off cost of saving or restoring callee-saved registers\n\
> > +(also known as call-preserved registers or non-volatile registers).\n\
> > +The parameters are as follows:\n\
> > +\n\
> > +@itemize\n\
> > +@item\n\
> > +@var{cost_type} is @samp{spill_cost_type::SAVE} for saving a register\n\
> > +and @samp{spill_cost_type::RESTORE} for restoring a register.\n\
> > +\n\
> > +@item\n\
> > +@var{hard_regno} and @var{mode} represent the whole register that\n\
> > +the register allocator is considering using; of these,\n\
> > +@var{nregs} registers are fully or partially callee-saved.\n\
> > +\n\
> > +@item\n\
> > +@var{mem_cost} is the normal cost for storing (for saves)\n\
> > +or loading (for restores) the @var{nregs} registers.\n\
> > +\n\
> > +@item\n\
> > +@var{allocated_callee_regs} is the set of callee-saved registers\n\
> > +that are already in use.\n\
> > +\n\
> > +@item\n\
> > +@var{existing_spills_p} is true if the register allocator has\n\
> > +already decided to spill registers to memory.\n\
> > +@end itemize\n\
> > +\n\
> > +If @var{existing_spills_p} is false, the cost of a save should account\n\
> > +for frame allocations in a way that is consistent with\n\
> > +@code{TARGET_FRAME_ALLOCATION_COST}'s handling of allocations for 
> > spills.\n\
> > +Similarly, the cost of a restore should then account for frame 
> > deallocations\n\
> > +in a way that is consistent with @code{TARGET_FRAME_ALLOCATION_COST}'s\n\
> > +handling of deallocations.\n\
> > +\n\
> > +Note that this hook should not attempt to apply a frequency scale\n\
> > +to the cost: it is the caller's responsibility to do that where\n\
> > +appropriate.\n\
> > +\n\
> > +The default implementation returns @var{mem_cost}, plus the allocation\n\
> > +or deallocation cost returned by @code{TARGET_FRAME_ALLOCATION_COST},\n\
> > +where appropriate.",
> > + int, (spill_cost_type cost_type, unsigned int hard_regno,
> > +       machine_mode mode, unsigned int nregs, int mem_cost,
> > +       const HARD_REG_SET &allocated_callee_regs, bool existing_spills_p),
> > + default_callee_save_cost)
> > +
> > +DEFHOOK
> > +(frame_allocation_cost,
> > + "Return the cost of allocating or deallocating a frame for the sake of\n\
> > +a spill; @var{cost_type} chooses between allocation and deallocation.\n\
> > +The term ``spill'' here includes both forcing a pseudo register to 
> > memory\n\
> > +and using caller-saved registers for pseudo registers that are live 
> > across\n\
> > +a call.\n\
> > +\n\
> > +This hook is only called if the register allocator has not so far\n\
> > +decided to spill.  The allocator may have decided to use callee-saved\n\
> > +registers; if so, @var{allocated_callee_regs} is the set of callee-saved\n\
> > +registers that the allocator has used.  There might also be other 
> > reasons\n\
> > +why a stack frame is already needed; for example, @samp{get_frame_size 
> > ()}\n\
> > +might be nonzero, or the target might already require a frame for\n\
> > +target-specific reasons.\n\
> > +\n\
> > +When the register allocator uses this hook to cost spills, it also uses\n\
> > +@code{TARGET_CALLEE_SAVE_COST} to cost new callee-saved registers, 
> > passing\n\
> > +@samp{false} as the @var{existing_spills_p} argument.  The intention is 
> > to\n\
> > +allow the target to apply an apples-for-apples comparison between the\n\
> > +cost of using callee-saved registers and using spills in cases where the\n\
> > +allocator has not yet committed to using both strategies.\n\
> > +\n\
> > +The default implementation returns 0.",
> > + int, (frame_cost_type cost_type, const HARD_REG_SET 
> > &allocated_callee_regs),
> > + default_frame_allocation_cost)
> > +
> >  DEFHOOK
> >  (use_by_pieces_infrastructure_p,
> >   "GCC will attempt several strategies when asked to copy between\n\
> > @@ -5714,18 +5789,6 @@ DEFHOOK
> >   reg_class_t, (int, reg_class_t, reg_class_t),
> >   default_ira_change_pseudo_allocno_class)
> >
> > -/* Scale of callee-saved register cost in epilogue and prologue used by
> > -   IRA.  */
> > -DEFHOOK
> > -(ira_callee_saved_register_cost_scale,
> > - "A target hook which returns the callee-saved register @var{hard_regno}\n\
> > -cost scale in epilogue and prologue used by IRA.\n\
> > -\n\
> > -The default version of this target hook returns 1 if optimizing for\n\
> > -size, otherwise returns the entry block frequency.",
> > - int, (int hard_regno),
> > - default_ira_callee_saved_register_cost_scale)
> > -
> >  /* Return true if we use LRA instead of reload.  */
> >  DEFHOOK
> >  (lra_p,
> > diff --git a/gcc/target.h b/gcc/target.h
> > index 3e1ee68a341..2bf35e2d0ee 100644
> > --- a/gcc/target.h
> > +++ b/gcc/target.h
> > @@ -284,6 +284,18 @@ enum poly_value_estimate_kind
> >    POLY_VALUE_LIKELY
> >  };
> >
> > +enum class spill_cost_type
> > +{
> > +  SAVE,
> > +  RESTORE
> > +};
> > +
> > +enum class frame_cost_type
> > +{
> > +  ALLOCATION,
> > +  DEALLOCATION
> > +};
> > +
> >  typedef void (*emit_support_tinfos_callback) (tree);
> >
> >  extern bool verify_type_context (location_t, type_context_kind, const_tree,
> > diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> > index 344075efa41..c79458e374e 100644
> > --- a/gcc/targhooks.cc
> > +++ b/gcc/targhooks.cc
> > @@ -2083,6 +2083,33 @@ default_register_move_cost (machine_mode mode 
> > ATTRIBUTE_UNUSED,
> >  #endif
> >  }
> >
> > +/* The default implementation of TARGET_CALLEE_SAVE_COST.  */
> > +
> > +int
> > +default_callee_save_cost (spill_cost_type spill_type, unsigned int,
> > +                         machine_mode, unsigned int, int mem_cost,
> > +                         const HARD_REG_SET &callee_saved_regs,
> > +                         bool existing_spills_p)
> > +{
> > +  if (!existing_spills_p)
> > +    {
> > +      auto frame_type = (spill_type == spill_cost_type::SAVE
> > +                        ? frame_cost_type::ALLOCATION
> > +                        : frame_cost_type::DEALLOCATION);
> > +      mem_cost += targetm.frame_allocation_cost (frame_type,
> > +                                                callee_saved_regs);
> > +    }
> > +  return mem_cost;
> > +}
> > +
> > +/* The default implementation of TARGET_FRAME_ALLOCATION_COST.  */
> > +
> > +int
> > +default_frame_allocation_cost (frame_cost_type, const HARD_REG_SET &)
> > +{
> > +  return 0;
> > +}
> > +
> >  /* The default implementation of TARGET_SLOW_UNALIGNED_ACCESS.  */
> >
> >  bool
> > diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> > index 8871e01430c..f16b58798c2 100644
> > --- a/gcc/targhooks.h
> > +++ b/gcc/targhooks.h
> > @@ -235,6 +235,11 @@ extern tree default_builtin_tm_load_store (tree);
> >  extern int default_memory_move_cost (machine_mode, reg_class_t, bool);
> >  extern int default_register_move_cost (machine_mode, reg_class_t,
> >                                        reg_class_t);
> > +extern int default_callee_save_cost (spill_cost_type, unsigned int,
> > +                                    machine_mode, unsigned int, int,
> > +                                    const HARD_REG_SET &, bool);
> > +extern int default_frame_allocation_cost (frame_cost_type,
> > +                                         const HARD_REG_SET &);
> >  extern bool default_slow_unaligned_access (machine_mode, unsigned int);
> >  extern HOST_WIDE_INT default_estimated_poly_value (poly_int64,
> >                                                    
> > poly_value_estimate_kind);
> > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_1.c 
> > b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c
> > new file mode 100644
> > index 00000000000..f28486112f4
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_1.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-options "-O2" } */
> > +
> > +int test (int x), test2 (int x);
> > +
> > +int foo (int x, int y) {
> > +    test (x);
> > +    int lhs = test2 (y);
> > +    return x + lhs;
> > +}
> > +
> > +/* { dg-final { scan-assembler {\tstp\tx19, x20, \[sp,} } } */
> > +/* { dg-final { scan-assembler {\tldp\tx19, x20, \[sp,} } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_2.c 
> > b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c
> > new file mode 100644
> > index 00000000000..744b464be2f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_2.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-options "-O2 -fomit-frame-pointer" } */
> > +
> > +int test (int x), test2 (int x);
> > +
> > +int foo (int x, int y) {
> > +    test (x);
> > +    int lhs = test2 (y);
> > +    return x + lhs;
> > +}
> > +
> > +/* { dg-final { scan-assembler {\tstp\tx30, x19, \[sp,} } } */
> > +/* { dg-final { scan-assembler {\tldp\tx30, x19, \[sp\],} } } */
> > +/* { dg-final { scan-assembler {\tstr\tw1, \[sp,} } } */
> > +/* { dg-final { scan-assembler {\tldr\tw0, \[sp,} } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/callee_save_3.c 
> > b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c
> > new file mode 100644
> > index 00000000000..50b6853e4ee
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/callee_save_3.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-options "-O2" } */
> > +
> > +float test ();
> > +float g;
> > +
> > +float foo (float x, float y) {
> > +  g = x + test ();
> > +  return (x + test ()) * y;
> > +}
> > +
> > +/* { dg-final { scan-assembler {\tstp\td14, d15, \[sp,} } } */
> > +/* { dg-final { scan-assembler {\tldp\td14, d15, \[sp,} } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c 
> > b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> > index a0e764e8653..129c6ac90e0 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr103350-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do run { target le } } */
> > -/* { dg-additional-options "-Os -fno-tree-ter -save-temps 
> > -fdump-rtl-ree-all -free -std=c99 -w" } */
> > +/* { dg-additional-options "-Os -fno-tree-ter -save-temps 
> > -fdump-rtl-ree-all -free -std=c99 -w -fno-caller-saves" } */
> >
> >  typedef unsigned char u8;
> >  typedef unsigned char __attribute__((__vector_size__ (8))) v64u8;
> > --
> > 2.25.1
> >



-- 
BR,
Hongtao

Reply via email to