On Fri, Mar 7, 2025 at 11:30 AM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> On Tue, Mar 4, 2025 at 6:18 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
> >
> > Richard Sandiford <richard.sandif...@arm.com> writes:
> > > Jan Hubicka <hubi...@ucw.cz> writes:
> > >>>
> > >>> Thanks for running these.  I saw poor results for perlbench with my
> > >>> initial aarch64 hooks because the hooks reduced the cost to zero for
> > >>> the entry case:
> > >>>
> > >>>         auto entry_cost = targetm.callee_save_cost
> > >>>           (spill_cost_type::SAVE, hard_regno, mode, saved_nregs,
> > >>>            ira_memory_move_cost[mode][rclass][0] * saved_nregs / nregs,
> > >>>            allocated_callee_save_regs, existing_spills_p);
> > >>>         /* In the event of a tie between caller-save and callee-save,
> > >>>            prefer callee-save.  We apply this to the entry cost rather
> > >>>            than the exit cost since the entry frequency must be at
> > >>>            least as high as the exit frequency.  */
> > >>>         if (entry_cost > 0)
> > >>>           entry_cost -= 1;
> > >>>
> > >>> I "fixed" that by bumping the cost to a minimum of 2, but I was
> > >>> wondering whether the "entry_cost > 0" should instead be "entry_cost > 
> > >>> 1",
> > >>> so that the cost is always greater than not using a callee save for
> > >>> registers that don't cross a call.  WDYT?
> > >>
> > >> For x86 perfomance costs, the push cost should be memory_move_cost which
> > >> is 6, -2 for adjustment in the target hook and -1 for this. So cost
> > >> should not be 0 I think.
> > >>
> > >> For size cost, I currently return 1, so we indeed get 0 after
> > >> adjustment.
> > >>
> > >> I think cost of 0 will make us to pick callee save even if caller save
> > >> is available and there are no function calls, so I guess we do not want
> > >> that....
> > >
> > > OK, here's an updated patch that makes that change.  The x86 parts
> > > should be replaced by your patch.
> > >
> > > Tested on aarch64-linux-gnu.  I also tried to test on 
> > > pwoerpc64el-linux-gnu
> > > (on gcc112), but I keep getting broken pipes during the test runs,
> > > so I'm struggling to get good before/after comparisons.  It does at
> > > least bootstrap though...
> >
> > Here's the patch with Honza's x86 changes.  Boostrapped & regresiion-tested
> > on aarch64-linux-gnu and powerpc64le-linux-gnu (gcc120).  The powerpc64le
> > results regressed:
> >
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 i == 5
> > FAIL: gcc.dg/guality/vla-1.c   -Os  -DPREVENT_OPTIMIZATION  line 24 sizeof 
> > (a) == 17 * sizeof (short)
> >
> > but the same test already failed for -O2 and -O3.
> >
> > OK to install now?  Or, given the lateness in the release cycle,
> > would it be better to wait for GCC 16?
> >
> > Thanks,
> > Richard
> >
> >
> > Following on from the discussion in:
> >
> >   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675256.html
> >
> > this patch removes TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE and
> > replaces it with two hooks: one that controls the cost of using an
> > extra callee-saved register and one that controls the cost of allocating
> > a frame for the first spill.
> >
> > (The patch does not attempt to address the shrink-wrapping part of
> > the thread above.)
> >
> > On AArch64, this is enough to fix PR117477, as verified by the new tests.
> > The patch does not change the SPEC2017 scores significantly.  (I saw a
> > slight improvement in fotonik3d and roms, but I'm not convinced that
> > the improvements are real.)
> >
> > The patch makes IRA use caller saves for gcc.target/aarch64/pr103350-1.c,
> > which is a scan-dump correctness test that relies on not using
> > caller saves.  The decision to use caller saves looks appropriate,
> > and saves an instruction, so I've just added -fno-caller-saves
> > to the test options.
> >
> > The x86 parts were written by Honza.
> >
> > gcc/
> >         PR rtl-optimization/117477
> >         * config/aarch64/aarch64.cc (aarch64_count_saves): New function.
> >         (aarch64_count_above_hard_fp_saves, aarch64_callee_save_cost)
> >         (aarch64_frame_allocation_cost): Likewise.
> >         (TARGET_CALLEE_SAVE_COST): Define.
> >         (TARGET_FRAME_ALLOCATION_COST): Likewise.
> >         * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
> >         Replace with...
> >         (ix86_callee_save_cost): ...this new hook.
> >         (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Delete.
> >         (TARGET_CALLEE_SAVE_COST): Define.
> >         * target.h (spill_cost_type, frame_cost_type): New enums.
> >         * target.def (callee_save_cost, frame_allocation_cost): New hooks.
> >         (ira_callee_saved_register_cost_scale): Delete.
> >         * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): 
> > Delete.
> >         (TARGET_CALLEE_SAVE_COST, TARGET_FRAME_ALLOCATION_COST): New hooks.
> >         * doc/tm.texi: Regenerate.
> >         * hard-reg-set.h (hard_reg_set_popcount): New function.
> >         * ira-color.cc (allocated_memory_p): New variable.
> >         (allocated_callee_save_regs): Likewise.
> >         (record_allocation): New function.
> >         (assign_hard_reg): Use targetm.frame_allocation_cost to model
> >         the cost of the first spill or first caller save.  Use
> >         targetm.callee_save_cost to model the cost of using new callee-saved
> >         registers.  Apply the exit rather than entry frequency to the cost
> >         of restoring a register or deallocating the frame.  Update the
> >         new variables above.
> >         (improve_allocation): Use record_allocation.
> >         (color): Initialize allocated_callee_save_regs.
> >         (ira_color): Initialize allocated_memory_p.
> >         * targhooks.h (default_callee_save_cost): Declare.
> >         (default_frame_allocation_cost): Likewise.
> >         * targhooks.cc (default_callee_save_cost): New function.
> >         (default_frame_allocation_cost): Likewise.
> >
> > gcc/testsuite/
> >         PR rtl-optimization/117477
> >         * gcc.target/aarch64/callee_save_1.c: New test.
> >         * gcc.target/aarch64/callee_save_2.c: Likewise.
> >         * gcc.target/aarch64/callee_save_3.c: Likewise.
> >         * gcc.target/aarch64/pr103350-1.c: Add -fno-caller-saves.
> >
> > Co-authored-by: Jan Hubicka <hubi...@ucw.cz>
>
> Here is the v2 patch.  The only change is
>
>   if (GENERAL_REGNO_P (hard_regno))
>     {
>       /* push is 1 byte while typical spill is 4-5 bytes.
>          ??? We probably should adjust size costs accordingly.
>          Costs are relative to reg-reg move that has 2 bytes for 32bit
>          and 3 bytes otherwise.  Be sure that no cost table sets cost
>          to 2, so we end up with 0.  */
>       if (mem_cost <= 2 || optimize_function_for_size_p (cfun))
>         return 1;
>       return mem_cost - 2;
>     }
>
> in ix86_callee_save_cost.
>
> Tested on x86-64 with
>
> $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'"
>
> OK for master?

Please give x86 maintainers a chance to chime in - when they are
happy the previous approval still holds for the rest.

Richard.

> Thanks.
>
> H.J.

Reply via email to