"Andre Vieira (lists)" <andre.simoesdiasvie...@arm.com> writes:
> This patch introduces a struct to differentiate between different 
> memmove costs to enable a better modeling of memory operations. These 
> have been modelled for 
> -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for 
> all other tunings all entries are equal to the old single memmove cost 
> to ensure the behaviour remains the same.

Thanks for doing this.  Having the same cost for loads and stores
has been a long-standing wart.

> 2022-03-16  Tamar Christina  <tamar.christ...@arm.com>
>                         Andre Vieira <andre.simoesdiasvie...@arm.com>
>
> gcc/ChangeLog:
>
>          * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New 
> struct.
>          (struct tune_params): Change type of memmov_cost to use 
> cpu_memmov_cost.
>          * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update 
> all tunings
>          to use new cpu_memmov_cost struct.
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -508,6 +508,18 @@ struct cpu_prefetch_tune
>    const int default_opt_level;
>  };
>  
> +/* Model the costs for loads/stores for reload so that it can do more

I'd say s/reload/the register allocators/ here, since the costs affect
decisions made by IRA too.

> +   accurate spill heuristics.  */
> +struct cpu_memmov_cost
> +{
> +  int load_int;
> +  int store_int;
> +  int load_fp;
> +  int store_fp;
> +  int load_pred;
> +  int store_pred;
> +};
> +
>  struct tune_params
>  {
>    const struct cpu_cost_table *insn_extra_cost;
> […]
> @@ -14501,12 +14633,41 @@ aarch64_register_move_cost (machine_mode mode,
>    return regmove_cost->FP2FP;
>  }
>  
> +/* Implements TARGET_MEMORY_MOVE_COST.  */
>  static int
> -aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
> -                       reg_class_t rclass ATTRIBUTE_UNUSED,
> -                       bool in ATTRIBUTE_UNUSED)
> +aarch64_memory_move_cost (machine_mode mode, reg_class_t rclass_i, bool in)
>  {
> -  return aarch64_tune_params.memmov_cost;
> +  enum reg_class rclass = (enum reg_class) rclass_i;
> +  switch (rclass)
> +    {
> +    case PR_LO_REGS:
> +    case PR_HI_REGS:
> +    case PR_REGS:
> +      return in ? aarch64_tune_params.memmov_cost.load_pred
> +             : aarch64_tune_params.memmov_cost.store_pred;
> +    case POINTER_AND_FP_REGS:
> +    case ALL_REGS:
> +      {
> +     if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
> +       return in ? aarch64_tune_params.memmov_cost.load_pred
> +                 : aarch64_tune_params.memmov_cost.store_pred;
> +
> +     if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode))
> +       return in ? aarch64_tune_params.memmov_cost.load_fp
> +                 : aarch64_tune_params.memmov_cost.store_fp;
> +
> +     return in ? aarch64_tune_params.memmov_cost.load_int
> +               : aarch64_tune_params.memmov_cost.store_int;
> +      }
> +    case FP_LO8_REGS:
> +    case FP_LO_REGS:
> +    case FP_REGS:
> +      return in ? aarch64_tune_params.memmov_cost.load_fp
> +             : aarch64_tune_params.memmov_cost.store_fp;
> +    default:
> +      return in ? aarch64_tune_params.memmov_cost.load_int
> +             : aarch64_tune_params.memmov_cost.store_int;
> +    }
>  }

It would be good to avoid listing individual subclasses if possible,
since it's easy for the list to get out of date if more subclasses
are added.

An alternative would be:

  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
      ? reg_classes_intersect_p (rclass, PR_REGS)
      : reg_class_subset_p (rclass, PR_REGS))
    return (in
            ? aarch64_tune_params.memmov_cost.load_pred
            : aarch64_tune_params.memmov_cost.store_pred);

  if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode)
      ? reg_classes_intersect_p (rclass, FP_REGS)
      : reg_class_subset_p (rclass, FP_REGS))
    return (in
            ? aarch64_tune_params.memmov_cost.load_fp
            : aarch64_tune_params.memmov_cost.store_fp);

  return (in
          ? aarch64_tune_params.memmov_cost.load_int
          : aarch64_tune_params.memmov_cost.store_int);

OK with that change, if it works.

Thanks,
Richard

Reply via email to