Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

Richard Sandiford Mon, 07 Jan 2019 09:39:11 -0800

Steve Ellcey <[email protected]> writes:
> On Thu, 2018-12-06 at 12:25 +0000, Richard Sandiford wrote:
>> 
>> Since we're looking at the call insns anyway, we could have a hook that
>> "jousts" two calls and picks the one that preserves *fewer* registers.
>> This would mean that loop produces a single instruction that conservatively
>> describes the call-preserved registers.  We could then stash that
>> instruction in lra_reg instead of the current check_part_clobbered
>> boolean.
>> 
>> The hook should by default be a null pointer, so that we can avoid
>> the instruction walk on targets that don't need it.
>> 
>> That would mean that LRA would always have a call instruction to hand
>> when asking about call-preserved information.  So I think we should
>> add an insn parameter to targetm.hard_regno_call_part_clobbered,
>> with a null insn selecting the defaul behaviour.   I know it's
>> going to be a pain to update all callers and targets, sorry.
>
> Richard,  here is an updated version of this patch.  It is not
> completly tested yet but I wanted to send this out and make
> sure it is what you had in mind and see if you had any comments about
> the new target function while I am testing it (including building
> some of the other targets).


Yeah, this was the kind of thing I had in mind, thanks.

>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
>     the lower 64 bits of a 128-bit register.  Tell the compiler the callee
>     clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode 
> mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
> +                                     unsigned int regno,
> +                                     machine_mode mode)
>  {
> +  if (insn && CALL_P (insn) && aarch64_simd_call_p (insn))
> +    return false;
>    return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);

This should be choosing between 8 and 16 for the maybe_gt, since
even SIMD functions clobber bits 128 and above for SVE.

> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> +  gcc_assert (CALL_P (call_1));
> +  if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2))
> +    return call_1;
> +  else
> +    return call_2;

Nit: redundant parens in "(call_2 == NULL_RTX)".

> diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> index 023308b..2cf993d 100644
> --- a/gcc/config/avr/avr.c
> +++ b/gcc/config/avr/avr.c
> @@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
>  
>  static bool
> -avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
> +avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> +                                 unsigned regno,
> +                                 machine_mode mode)
>  {

Also very minor, sorry, but: I think it's usual to put the parameters
on the same line when they fit.  Same for the other hooks.

> @@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to 
> nearby locations.
>  
>  The hook returns true if it succeeds, storing the offset of the
>  anchor from the base in @var{offset1} and the offset of the final address
> -from the anchor in @var{offset2}.  The default implementation returns false.
> +from the anchor in @var{offset2}.  ehe defnult implementation returns false.
>  @end deftypefn

Stray change here.

> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 7ffcd35..31a567a 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno,
>  static inline bool
>  need_for_call_save_p (int regno)
>  {
> +  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
> +  int new_regno = reg_renumber[regno];
> +
>    lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
> -  return (usage_insns[regno].calls_num < calls_num
> -       && (overlaps_hard_reg_set_p
> -           ((flag_ipa_ra &&
> -             ! hard_reg_set_empty_p 
> (lra_reg_info[regno].actual_call_used_reg_set))
> -            ? lra_reg_info[regno].actual_call_used_reg_set
> -            : call_used_reg_set,
> -            PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
> -           || (targetm.hard_regno_call_part_clobbered
> -               (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
> +
> +  if (usage_insns[regno].calls_num >= calls_num)
> +    return false;
> +
> +  if (flag_ipa_ra
> +      && !hard_reg_set_empty_p 
> (lra_reg_info[regno].actual_call_used_reg_set))
> +    return (overlaps_hard_reg_set_p
> +             (lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
> +         || targetm.hard_regno_call_part_clobbered
> +             (lra_reg_info[regno].call_insn, new_regno, pmode));
> +  else
> +    return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
> +            || targetm.hard_regno_call_part_clobbered
> +             (lra_reg_info[regno].call_insn, new_regno, pmode));
>  }
>  

I think it'd be safer just to add the new parameter to the existing code,
rather than rework it like this.

I'm not sure off-hand why the existing code tests
targetm.hard_regno_call_part_clobbered when we have
actual_call_used_reg_set.  Seems like it shouldn't be necessary
if we know exactly which registers the function clobbers.
Changing that should probably be a separate follow-on patch though.

>  /* Global registers occurring in the current EBB.  */


> @@ -635,6 +637,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>    rtx link, *link_loc;
>    bool need_curr_point_incr;
>    HARD_REG_SET last_call_used_reg_set;
> +  rtx_insn *call_insn;
>  
>    reg_live_out = df_get_live_out (bb);
>    sparseset_clear (pseudos_live);
> @@ -658,6 +661,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>    if (lra_dump_file != NULL)
>      fprintf (lra_dump_file, "  BB %d\n", bb->index);
>  
> +  call_insn = NULL;
> +  if (targetm.return_call_with_max_clobbers)
> +    {
> +      FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
> +     {
> +       if (CALL_P (curr_insn))
> +         call_insn = targetm.return_call_with_max_clobbers (curr_insn,
> +                                                            call_insn);
> +     }
> +    }
> +
>    /* Scan the code of this basic block, noting which pseudos and hard
>       regs are born or die.
>  
> @@ -847,7 +861,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>             update_pseudo_point (reg->regno, curr_point, USE_POINT);
>             mark_regno_live (reg->regno, reg->biggest_mode);
>             check_pseudos_live_through_calls (reg->regno,
> -                                             last_call_used_reg_set);
> +                                             last_call_used_reg_set,
> +                                             call_insn);
>           }
>  
>         if (!HARD_REGISTER_NUM_P (reg->regno))
> @@ -912,9 +927,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>               {
>                 IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
>                                   this_call_used_reg_set);
> +
> +               lra_reg_info[j].call_insn = curr_insn;
> +
>                 if (flush)
> -                 check_pseudos_live_through_calls
> -                   (j, last_call_used_reg_set);
> +                 check_pseudos_live_through_calls (j,
> +                                                   last_call_used_reg_set,
> +                                                   call_insn);
>               }
>             COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
>           }
> @@ -956,7 +975,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>             update_pseudo_point (reg->regno, curr_point, USE_POINT);
>           mark_regno_live (reg->regno, reg->biggest_mode);
>           check_pseudos_live_through_calls (reg->regno,
> -                                           last_call_used_reg_set);
> +                                           last_call_used_reg_set,
> +                                           call_insn);
>         }
>  
>        for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
> @@ -1125,7 +1145,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
> dead_insn_p)
>        if (sparseset_cardinality (pseudos_live_through_calls) == 0)
>       break;
>        if (sparseset_bit_p (pseudos_live_through_calls, j))
> -     check_pseudos_live_through_calls (j, last_call_used_reg_set);
> +     check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
>      }
>  
>    for (i = 0; HARD_REGISTER_NUM_P (i); ++i)

I think we can do this more accurately by instead keeping track of the
current call during the main block walk and extending this:

          if (! flag_ipa_ra)
            COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
          else
            {

so that we use the "else" when targetm.return_call_with_max_clobbers
is nonnull.  Then we should extend this:

              bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
                            && ! hard_reg_set_equal_p (last_call_used_reg_set,
                                                       this_call_used_reg_set))

so that we flush when the old call insn preserves more registers
than the new one.

Also, I think:

                  lra_reg_info[j].call_insn = curr_insn;

should happen in check_pseudos_live_through_calls and should apply
targetm.return_call_with_max_clobbers to the register's existing
call_insn (if any), rather than simply overwrite it.  That way the
register info will track the "worst" call for all regions in which
the register is live.

> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index a180ced..109add0 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
>        && ! DEBUG_INSN_P (tmp->insn))
>       || (this_head->need_caller_save_reg
>           && ! (targetm.hard_regno_call_part_clobbered
> -               (reg, GET_MODE (*tmp->loc)))
> +               (tmp->insn, reg, GET_MODE (*tmp->loc)))
>           && (targetm.hard_regno_call_part_clobbered
> -             (new_reg, GET_MODE (*tmp->loc)))))
> +             (tmp->insn, new_reg, GET_MODE (*tmp->loc)))))
>        return false;
>  
>    return true;


tmp->insn isn't the call we care about here.  I think we should just
pass null.

> diff --git a/gcc/reload1.c b/gcc/reload1.c
> index b703402..5490ae5 100644
> --- a/gcc/reload1.c
> +++ b/gcc/reload1.c
> @@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
>                          : out_regno + k);
>                     reg_reloaded_insn[regno + k] = insn;
>                     SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> -                   if (targetm.hard_regno_call_part_clobbered (regno + k,
> +                   if (targetm.hard_regno_call_part_clobbered (insn,
> +                                                               regno + k,
>                                                                 mode))
>                       SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>                                         regno + k);
> @@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
>                          : in_regno + k);
>                     reg_reloaded_insn[regno + k] = insn;
>                     SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> -                   if (targetm.hard_regno_call_part_clobbered (regno + k,
> +                   if (targetm.hard_regno_call_part_clobbered (insn,
> +                                                               regno + k,
>                                                                 mode))
>                       SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>                                         regno + k);
> @@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
>                     CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
>                     SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
>                     if (targetm.hard_regno_call_part_clobbered
> -                       (src_regno + k, mode))
> +                       (insn, src_regno + k, mode))
>                       SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>                                         src_regno + k);
>                     else

The insns in this case might not be call instructions.  I think for
the legacy reload.c, reload1.c and caller-save.c we can just pass NULL
rather than try to be optimal.

> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index e15cf08..53c2e26 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -3728,7 +3728,8 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn 
> *insn)
>               Since we only have a choice between 'might be clobbered'
>               and 'definitely not clobbered', we must include all
>               partly call-clobbered registers here.  */
> -         else if (targetm.hard_regno_call_part_clobbered (i,
> +         else if (targetm.hard_regno_call_part_clobbered (insn,
> +                                                          i,
>                                                            reg_raw_mode[i])
>                       || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
>                SET_REGNO_REG_SET (reg_pending_clobbers, i);

No need to split the line after "insn".

> diff --git a/gcc/target.def b/gcc/target.def
> index e8f0f70..ecb0ea7 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5772,8 +5772,21 @@ return true for a 64-bit mode but false for a 32-bit 
> mode.\n\
>  \n\
>  The default implementation returns false, which is correct\n\
>  for targets that don't have partly call-clobbered registers.",
> - bool, (unsigned int regno, machine_mode mode),
> - hook_bool_uint_mode_false)
> + bool, (rtx_insn *, unsigned int regno, machine_mode mode),
> + hook_bool_insn_uint_mode_false)

Should name the insn parameter and describe it in the docs.
(Realise this is just a first cut.)

> +DEFHOOK
> +(return_call_with_max_clobbers,
> + "This hook returns a pointer to the call that partially clobbers the\n\
> +most registers.  If a platform supports multiple ABIs where the registers\n\
> +that are partially clobbered may vary, this function compares two\n\
> +two calls and return a pointer to the one that clobbers the most 
> registers.\n\

s/two two calls and return/two calls and returns/

Thanks,
Richard

Re: [Patch 4/4][Aarch64] v2: Implement Aarch64 SIMD ABI

Reply via email to