Ping
Richard Sandiford <richard.sandif...@arm.com> writes: > Steve Ellcey <sell...@marvell.com> writes: >> Richard, >> >> I don't necessarily disagree with anything in your comments and long >> term I think that is the right direction, but I wonder if that level of >> change is appropriate for GCC Stage 4 which is where we are now. Your >> changes would require fixes in shared code, whereas setting >> REG_ALLOC_ORDER only affects Aarch64 and seems like a safer change. > > I guess it's weighing invasiveness in terms of lines of code/location of > code vs. invasiveness in terms of the effect the code has. Changing > REG_ALLOC_ORDER affects all functinos that use significant numbers of > SIMD registers, which could have unexpected knock-on effects like > performance regressions or exposing latent bugs. It would also make > the RA try V24 before V16 for callers of vector PCS functions, > even though they should prefer V16. > > Keeping the change specific to callees that use the new vector PCS > feature seems more conservative from that point of view. > >> I am not sure how long it would take me to implement something along >> the lines of what you are suggesting. > > OK, in that case, I thought I'd give a go. > > This patch adds a new target hook that says which registers a function > can use without saving them in the prologue and restoring them in the > epilogue. By default this is call_used_reg_set. > > The patch only makes IRA use the hook, and in that context it's just > an (important) optimisation. But I think we need the same thing for > passes like regrename, where it'll be a correctness issue. I'll follow > on with patches there if this one is OK. > > Tested on aarch64-linux-gnu (with and without SVE) and x86_64-linux-gnu. > OK to install? > > Richard > > > 2019-03-12 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > PR target/89628 > * target.def (function_scratch_regs): New hook. > * doc/tm.texi.in (TARGET_FUNCTION_SCRATCH_REGS): New placeholder. > * doc/tm.texi: Regenerate. > * targhooks.h (default_function_scratch_regs): Declare. > * targhooks.c (default_function_scratch_regs): New function. > * emit-rtl.h (rtl_data::scratch_regs): New member variable. > * function.c (init_dummy_function_start): Initialize it. > (init_function_start): Likewise. > * ira-color.c (calculate_saved_nregs): Use it instead of > call_used_reg_set. > * config/aarch64/aarch64.c (aarch64_function_scratch_regs): New > function. > (TARGET_FUNCTION_SCRATCH_REGS): Redefine. > > gcc/testsuite/ > PR target/89628 > * gcc.target/aarch64/pr89628.c: New test. > > Index: gcc/target.def > =================================================================== > --- gcc/target.def 2019-03-08 18:14:26.113008680 +0000 > +++ gcc/target.def 2019-03-12 09:34:25.447913041 +0000 > @@ -5763,6 +5763,18 @@ The default version of this hook always > default_hard_regno_scratch_ok) > > DEFHOOK > +(function_scratch_regs, > + "This hook sets @var{regs} to the set of registers that function @var{fn}\n\ > +can use without having to save them in the prologue and restore them in > the\n\ > +epilogue.\n\ > +\n\ > +By default this set is the same as @code{call_used_reg_set}. Targets only\n\ > +need to override the hook if some functions implement alternative ABIs > that\n\ > +save more registers or fewer registers than normal.", > + void, (struct function *fn, HARD_REG_SET *regs), > + default_function_scratch_regs) > + > +DEFHOOK > (hard_regno_call_part_clobbered, > "This hook should return true if @var{regno} is partly call-saved and\n\ > partly call-clobbered, and if a value of mode @var{mode} would be partly\n\ > Index: gcc/doc/tm.texi.in > =================================================================== > --- gcc/doc/tm.texi.in 2019-03-08 18:14:25.577010718 +0000 > +++ gcc/doc/tm.texi.in 2019-03-12 09:34:25.443913054 +0000 > @@ -1705,6 +1705,11 @@ of @code{CALL_USED_REGISTERS}. > @cindex call-used register > @cindex call-clobbered register > @cindex call-saved register > +@hook TARGET_FUNCTION_SCRATCH_REGS > + > +@cindex call-used register > +@cindex call-clobbered register > +@cindex call-saved register > @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED > > @hook TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS > Index: gcc/doc/tm.texi > =================================================================== > --- gcc/doc/tm.texi 2019-03-08 18:14:25.573010734 +0000 > +++ gcc/doc/tm.texi 2019-03-12 09:34:25.443913054 +0000 > @@ -1894,6 +1894,19 @@ of @code{CALL_USED_REGISTERS}. > @cindex call-used register > @cindex call-clobbered register > @cindex call-saved register > +@deftypefn {Target Hook} void TARGET_FUNCTION_SCRATCH_REGS (struct function > *@var{fn}, HARD_REG_SET *@var{regs}) > +This hook sets @var{regs} to the set of registers that function @var{fn} > +can use without having to save them in the prologue and restore them in the > +epilogue. > + > +By default this set is the same as @code{call_used_reg_set}. Targets only > +need to override the hook if some functions implement alternative ABIs that > +save more registers or fewer registers than normal. > +@end deftypefn > + > +@cindex call-used register > +@cindex call-clobbered register > +@cindex call-saved register > @deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED > (rtx_insn *@var{insn}, unsigned int @var{regno}, machine_mode @var{mode}) > This hook should return true if @var{regno} is partly call-saved and > partly call-clobbered, and if a value of mode @var{mode} would be partly > Index: gcc/targhooks.h > =================================================================== > --- gcc/targhooks.h 2019-03-08 18:14:25.849009683 +0000 > +++ gcc/targhooks.h 2019-03-12 09:34:25.447913041 +0000 > @@ -286,5 +286,6 @@ extern bool speculation_safe_value_not_n > extern rtx default_speculation_safe_value (machine_mode, rtx, rtx, rtx); > extern void default_remove_extra_call_preserved_regs (rtx_insn *, > HARD_REG_SET *); > +extern void default_function_scratch_regs (struct function *, HARD_REG_SET > *); > > #endif /* GCC_TARGHOOKS_H */ > Index: gcc/targhooks.c > =================================================================== > --- gcc/targhooks.c 2019-03-08 18:14:25.845009699 +0000 > +++ gcc/targhooks.c 2019-03-12 09:34:25.447913041 +0000 > @@ -2379,4 +2379,10 @@ default_remove_extra_call_preserved_regs > { > } > > +void > +default_function_scratch_regs (struct function *, HARD_REG_SET *regs) > +{ > + COPY_HARD_REG_SET (*regs, call_used_reg_set); > +} > + > #include "gt-targhooks.h" > Index: gcc/emit-rtl.h > =================================================================== > --- gcc/emit-rtl.h 2019-03-08 18:15:33.700751752 +0000 > +++ gcc/emit-rtl.h 2019-03-12 09:34:25.443913054 +0000 > @@ -292,6 +292,9 @@ struct GTY(()) rtl_data { > sets them. */ > HARD_REG_SET asm_clobbers; > > + /* Registers that the function can use without having to save them first. > */ > + HARD_REG_SET scratch_regs; > + > /* The highest address seen during shorten_branches. */ > int max_insn_address; > }; > Index: gcc/function.c > =================================================================== > --- gcc/function.c 2019-03-08 18:15:33.660751905 +0000 > +++ gcc/function.c 2019-03-12 09:34:25.447913041 +0000 > @@ -4875,6 +4875,7 @@ init_dummy_function_start (void) > { > push_dummy_function (false); > prepare_function_start (); > + COPY_HARD_REG_SET (crtl->scratch_regs, call_used_reg_set); > } > > /* Generate RTL for the start of the function SUBR (a FUNCTION_DECL tree > node) > @@ -4888,6 +4889,8 @@ init_function_start (tree subr) > initialize_rtl (); > > prepare_function_start (); > + targetm.function_scratch_regs (DECL_STRUCT_FUNCTION (subr), > + &crtl->scratch_regs); > decide_function_section (subr); > > /* Warn if this value is an aggregate type, > Index: gcc/ira-color.c > =================================================================== > --- gcc/ira-color.c 2019-03-08 18:15:26.824777889 +0000 > +++ gcc/ira-color.c 2019-03-12 09:34:25.447913041 +0000 > @@ -1660,7 +1660,7 @@ calculate_saved_nregs (int hard_regno, m > ira_assert (hard_regno >= 0); > for (i = hard_regno_nregs (hard_regno, mode) - 1; i >= 0; i--) > if (!allocated_hardreg_p[hard_regno + i] > - && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i) > + && !TEST_HARD_REG_BIT (crtl->scratch_regs, hard_regno + i) > && !LOCAL_REGNO (hard_regno + i)) > nregs++; > return nregs; > Index: gcc/config/aarch64/aarch64.c > =================================================================== > --- gcc/config/aarch64/aarch64.c 2019-03-08 18:15:38.228734542 +0000 > +++ gcc/config/aarch64/aarch64.c 2019-03-12 09:34:25.427913117 +0000 > @@ -1706,6 +1706,19 @@ aarch64_simd_call_p (rtx_insn *insn) > return aarch64_simd_decl_p (fndecl); > } > > +/* Implement TARGET_FUNCTION_SCRATCH_REGS. Vector PCS functions have > + fewer available scratch registers than normal functions do. */ > + > +static void > +aarch64_function_scratch_regs (function *fn, HARD_REG_SET *scratch_regs) > +{ > + default_function_scratch_regs (fn, scratch_regs); > + if (aarch64_simd_decl_p (fn->decl)) > + for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) > + if (FP_SIMD_SAVED_REGNUM_P (regno)) > + CLEAR_HARD_REG_BIT (*scratch_regs, regno); > +} > + > /* Implement TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS. If INSN calls > a function that uses the SIMD ABI, take advantage of the extra > call-preserved registers that the ABI provides. */ > @@ -19208,6 +19221,9 @@ #define TARGET_MODES_TIEABLE_P aarch64_m > #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \ > aarch64_hard_regno_call_part_clobbered > > +#undef TARGET_FUNCTION_SCRATCH_REGS > +#define TARGET_FUNCTION_SCRATCH_REGS aarch64_function_scratch_regs > + > #undef TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS > #define TARGET_REMOVE_EXTRA_CALL_PRESERVED_REGS \ > aarch64_remove_extra_call_preserved_regs > Index: gcc/testsuite/gcc.target/aarch64/pr89628.c > =================================================================== > --- /dev/null 2019-03-08 11:40:14.606883727 +0000 > +++ gcc/testsuite/gcc.target/aarch64/pr89628.c 2019-03-12 > 09:34:25.447913041 +0000 > @@ -0,0 +1,25 @@ > +/* { dg-options "-O2" } */ > + > +typedef __Float32x4_t vec; > + > +__attribute__((aarch64_vector_pcs)) > +vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7) > +{ > + vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3; > + t0 = a0 - a7; > + t1 = a1 - a6; > + t2 = a2 - a5; > + t3 = a3 - a4; > + t4 = a4 - a3; > + t5 = a5 - a2; > + t6 = a6 - a1; > + t7 = a7 - a0; > + s0 = t0 * t1; > + s1 = t2 * t3; > + s2 = t4 * t5; > + s3 = t6 * t7; > + return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7; > +} > + > +/* { dg-final { scan-assembler-not '\tldr\t' } } */ > +/* { dg-final { scan-assembler-not '\tstr\t' } } */