Steve Ellcey <[email protected]> writes:
> On Thu, 2018-12-06 at 12:25 +0000, Richard Sandiford wrote:
>>
>> Since we're looking at the call insns anyway, we could have a hook that
>> "jousts" two calls and picks the one that preserves *fewer* registers.
>> This would mean that loop produces a single instruction that conservatively
>> describes the call-preserved registers. We could then stash that
>> instruction in lra_reg instead of the current check_part_clobbered
>> boolean.
>>
>> The hook should by default be a null pointer, so that we can avoid
>> the instruction walk on targets that don't need it.
>>
>> That would mean that LRA would always have a call instruction to hand
>> when asking about call-preserved information. So I think we should
>> add an insn parameter to targetm.hard_regno_call_part_clobbered,
>> with a null insn selecting the defaul behaviour. I know it's
>> going to be a pain to update all callers and targets, sorry.
>
> Richard, here is an updated version of this patch. It is not
> completly tested yet but I wanted to send this out and make
> sure it is what you had in mind and see if you had any comments about
> the new target function while I am testing it (including building
> some of the other targets).
Yeah, this was the kind of thing I had in mind, thanks.
> /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. The callee only saves
> the lower 64 bits of a 128-bit register. Tell the compiler the callee
> clobbers the top 64 bits when restoring the bottom 64 bits. */
>
> static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode
> mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
> + unsigned int regno,
> + machine_mode mode)
> {
> + if (insn && CALL_P (insn) && aarch64_simd_call_p (insn))
> + return false;
> return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
This should be choosing between 8 and 16 for the maybe_gt, since
even SIMD functions clobber bits 128 and above for SVE.
> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS. */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> + gcc_assert (CALL_P (call_1));
> + if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2))
> + return call_1;
> + else
> + return call_2;
Nit: redundant parens in "(call_2 == NULL_RTX)".
> diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> index 023308b..2cf993d 100644
> --- a/gcc/config/avr/avr.c
> +++ b/gcc/config/avr/avr.c
> @@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno,
> machine_mode mode)
> /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED. */
>
> static bool
> -avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
> +avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> + unsigned regno,
> + machine_mode mode)
> {
Also very minor, sorry, but: I think it's usual to put the parameters
on the same line when they fit. Same for the other hooks.
> @@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to
> nearby locations.
>
> The hook returns true if it succeeds, storing the offset of the
> anchor from the base in @var{offset1} and the offset of the final address
> -from the anchor in @var{offset2}. The default implementation returns false.
> +from the anchor in @var{offset2}. ehe defnult implementation returns false.
> @end deftypefn
Stray change here.
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 7ffcd35..31a567a 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno,
> static inline bool
> need_for_call_save_p (int regno)
> {
> + machine_mode pmode = PSEUDO_REGNO_MODE (regno);
> + int new_regno = reg_renumber[regno];
> +
> lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
> - return (usage_insns[regno].calls_num < calls_num
> - && (overlaps_hard_reg_set_p
> - ((flag_ipa_ra &&
> - ! hard_reg_set_empty_p
> (lra_reg_info[regno].actual_call_used_reg_set))
> - ? lra_reg_info[regno].actual_call_used_reg_set
> - : call_used_reg_set,
> - PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
> - || (targetm.hard_regno_call_part_clobbered
> - (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
> +
> + if (usage_insns[regno].calls_num >= calls_num)
> + return false;
> +
> + if (flag_ipa_ra
> + && !hard_reg_set_empty_p
> (lra_reg_info[regno].actual_call_used_reg_set))
> + return (overlaps_hard_reg_set_p
> + (lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
> + || targetm.hard_regno_call_part_clobbered
> + (lra_reg_info[regno].call_insn, new_regno, pmode));
> + else
> + return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
> + || targetm.hard_regno_call_part_clobbered
> + (lra_reg_info[regno].call_insn, new_regno, pmode));
> }
>
I think it'd be safer just to add the new parameter to the existing code,
rather than rework it like this.
I'm not sure off-hand why the existing code tests
targetm.hard_regno_call_part_clobbered when we have
actual_call_used_reg_set. Seems like it shouldn't be necessary
if we know exactly which registers the function clobbers.
Changing that should probably be a separate follow-on patch though.
> /* Global registers occurring in the current EBB. */
> @@ -635,6 +637,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> rtx link, *link_loc;
> bool need_curr_point_incr;
> HARD_REG_SET last_call_used_reg_set;
> + rtx_insn *call_insn;
>
> reg_live_out = df_get_live_out (bb);
> sparseset_clear (pseudos_live);
> @@ -658,6 +661,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> if (lra_dump_file != NULL)
> fprintf (lra_dump_file, " BB %d\n", bb->index);
>
> + call_insn = NULL;
> + if (targetm.return_call_with_max_clobbers)
> + {
> + FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
> + {
> + if (CALL_P (curr_insn))
> + call_insn = targetm.return_call_with_max_clobbers (curr_insn,
> + call_insn);
> + }
> + }
> +
> /* Scan the code of this basic block, noting which pseudos and hard
> regs are born or die.
>
> @@ -847,7 +861,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> update_pseudo_point (reg->regno, curr_point, USE_POINT);
> mark_regno_live (reg->regno, reg->biggest_mode);
> check_pseudos_live_through_calls (reg->regno,
> - last_call_used_reg_set);
> + last_call_used_reg_set,
> + call_insn);
> }
>
> if (!HARD_REGISTER_NUM_P (reg->regno))
> @@ -912,9 +927,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> {
> IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
> this_call_used_reg_set);
> +
> + lra_reg_info[j].call_insn = curr_insn;
> +
> if (flush)
> - check_pseudos_live_through_calls
> - (j, last_call_used_reg_set);
> + check_pseudos_live_through_calls (j,
> + last_call_used_reg_set,
> + call_insn);
> }
> COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
> }
> @@ -956,7 +975,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> update_pseudo_point (reg->regno, curr_point, USE_POINT);
> mark_regno_live (reg->regno, reg->biggest_mode);
> check_pseudos_live_through_calls (reg->regno,
> - last_call_used_reg_set);
> + last_call_used_reg_set,
> + call_insn);
> }
>
> for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
> @@ -1125,7 +1145,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool
> dead_insn_p)
> if (sparseset_cardinality (pseudos_live_through_calls) == 0)
> break;
> if (sparseset_bit_p (pseudos_live_through_calls, j))
> - check_pseudos_live_through_calls (j, last_call_used_reg_set);
> + check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
> }
>
> for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
I think we can do this more accurately by instead keeping track of the
current call during the main block walk and extending this:
if (! flag_ipa_ra)
COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
else
{
so that we use the "else" when targetm.return_call_with_max_clobbers
is nonnull. Then we should extend this:
bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
&& ! hard_reg_set_equal_p (last_call_used_reg_set,
this_call_used_reg_set))
so that we flush when the old call insn preserves more registers
than the new one.
Also, I think:
lra_reg_info[j].call_insn = curr_insn;
should happen in check_pseudos_live_through_calls and should apply
targetm.return_call_with_max_clobbers to the register's existing
call_insn (if any), rather than simply overwrite it. That way the
register info will track the "worst" call for all regions in which
the register is live.
> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index a180ced..109add0 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
> && ! DEBUG_INSN_P (tmp->insn))
> || (this_head->need_caller_save_reg
> && ! (targetm.hard_regno_call_part_clobbered
> - (reg, GET_MODE (*tmp->loc)))
> + (tmp->insn, reg, GET_MODE (*tmp->loc)))
> && (targetm.hard_regno_call_part_clobbered
> - (new_reg, GET_MODE (*tmp->loc)))))
> + (tmp->insn, new_reg, GET_MODE (*tmp->loc)))))
> return false;
>
> return true;
tmp->insn isn't the call we care about here. I think we should just
pass null.
> diff --git a/gcc/reload1.c b/gcc/reload1.c
> index b703402..5490ae5 100644
> --- a/gcc/reload1.c
> +++ b/gcc/reload1.c
> @@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
> : out_regno + k);
> reg_reloaded_insn[regno + k] = insn;
> SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> - if (targetm.hard_regno_call_part_clobbered (regno + k,
> + if (targetm.hard_regno_call_part_clobbered (insn,
> + regno + k,
> mode))
> SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
> regno + k);
> @@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
> : in_regno + k);
> reg_reloaded_insn[regno + k] = insn;
> SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> - if (targetm.hard_regno_call_part_clobbered (regno + k,
> + if (targetm.hard_regno_call_part_clobbered (insn,
> + regno + k,
> mode))
> SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
> regno + k);
> @@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
> CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
> SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
> if (targetm.hard_regno_call_part_clobbered
> - (src_regno + k, mode))
> + (insn, src_regno + k, mode))
> SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
> src_regno + k);
> else
The insns in this case might not be call instructions. I think for
the legacy reload.c, reload1.c and caller-save.c we can just pass NULL
rather than try to be optimal.
> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index e15cf08..53c2e26 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -3728,7 +3728,8 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn
> *insn)
> Since we only have a choice between 'might be clobbered'
> and 'definitely not clobbered', we must include all
> partly call-clobbered registers here. */
> - else if (targetm.hard_regno_call_part_clobbered (i,
> + else if (targetm.hard_regno_call_part_clobbered (insn,
> + i,
> reg_raw_mode[i])
> || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
> SET_REGNO_REG_SET (reg_pending_clobbers, i);
No need to split the line after "insn".
> diff --git a/gcc/target.def b/gcc/target.def
> index e8f0f70..ecb0ea7 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5772,8 +5772,21 @@ return true for a 64-bit mode but false for a 32-bit
> mode.\n\
> \n\
> The default implementation returns false, which is correct\n\
> for targets that don't have partly call-clobbered registers.",
> - bool, (unsigned int regno, machine_mode mode),
> - hook_bool_uint_mode_false)
> + bool, (rtx_insn *, unsigned int regno, machine_mode mode),
> + hook_bool_insn_uint_mode_false)
Should name the insn parameter and describe it in the docs.
(Realise this is just a first cut.)
> +DEFHOOK
> +(return_call_with_max_clobbers,
> + "This hook returns a pointer to the call that partially clobbers the\n\
> +most registers. If a platform supports multiple ABIs where the registers\n\
> +that are partially clobbered may vary, this function compares two\n\
> +two calls and return a pointer to the one that clobbers the most
> registers.\n\
s/two two calls and return/two calls and returns/
Thanks,
Richard