"Andre Vieira (lists)" <andre.simoesdiasvie...@arm.com> writes: > This patch introduces a struct to differentiate between different > memmove costs to enable a better modeling of memory operations. These > have been modelled for > -mcpu/-mtune=neoverse-v1/neoverse-n1/neoverse-n2/neoverse-512tvb, for > all other tunings all entries are equal to the old single memmove cost > to ensure the behaviour remains the same.
Thanks for doing this. Having the same cost for loads and stores has been a long-standing wart. > 2022-03-16 Tamar Christina <tamar.christ...@arm.com> > Andre Vieira <andre.simoesdiasvie...@arm.com> > > gcc/ChangeLog: > > * config/aarch64/aarch64-protos.h (struct cpu_memmov_cost): New > struct. > (struct tune_params): Change type of memmov_cost to use > cpu_memmov_cost. > * config/aarch64/aarch64.cc (aarch64_memory_move_cost): Update > all tunings > to use new cpu_memmov_cost struct. > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index > f2fde35c6eb4989af8736db8fad004171c160282..5190eb8b96ea9af809a28470905b8b85ee720b09 > 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -508,6 +508,18 @@ struct cpu_prefetch_tune > const int default_opt_level; > }; > > +/* Model the costs for loads/stores for reload so that it can do more I'd say s/reload/the register allocators/ here, since the costs affect decisions made by IRA too. > + accurate spill heuristics. */ > +struct cpu_memmov_cost > +{ > + int load_int; > + int store_int; > + int load_fp; > + int store_fp; > + int load_pred; > + int store_pred; > +}; > + > struct tune_params > { > const struct cpu_cost_table *insn_extra_cost; > […] > @@ -14501,12 +14633,41 @@ aarch64_register_move_cost (machine_mode mode, > return regmove_cost->FP2FP; > } > > +/* Implements TARGET_MEMORY_MOVE_COST. */ > static int > -aarch64_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED, > - reg_class_t rclass ATTRIBUTE_UNUSED, > - bool in ATTRIBUTE_UNUSED) > +aarch64_memory_move_cost (machine_mode mode, reg_class_t rclass_i, bool in) > { > - return aarch64_tune_params.memmov_cost; > + enum reg_class rclass = (enum reg_class) rclass_i; > + switch (rclass) > + { > + case PR_LO_REGS: > + case PR_HI_REGS: > + case PR_REGS: > + return in ? aarch64_tune_params.memmov_cost.load_pred > + : aarch64_tune_params.memmov_cost.store_pred; > + case POINTER_AND_FP_REGS: > + case ALL_REGS: > + { > + if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL) > + return in ? aarch64_tune_params.memmov_cost.load_pred > + : aarch64_tune_params.memmov_cost.store_pred; > + > + if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode)) > + return in ? aarch64_tune_params.memmov_cost.load_fp > + : aarch64_tune_params.memmov_cost.store_fp; > + > + return in ? aarch64_tune_params.memmov_cost.load_int > + : aarch64_tune_params.memmov_cost.store_int; > + } > + case FP_LO8_REGS: > + case FP_LO_REGS: > + case FP_REGS: > + return in ? aarch64_tune_params.memmov_cost.load_fp > + : aarch64_tune_params.memmov_cost.store_fp; > + default: > + return in ? aarch64_tune_params.memmov_cost.load_int > + : aarch64_tune_params.memmov_cost.store_int; > + } > } It would be good to avoid listing individual subclasses if possible, since it's easy for the list to get out of date if more subclasses are added. An alternative would be: if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL ? reg_classes_intersect_p (rclass, PR_REGS) : reg_class_subset_p (rclass, PR_REGS)) return (in ? aarch64_tune_params.memmov_cost.load_pred : aarch64_tune_params.memmov_cost.store_pred); if (VECTOR_MODE_P (mode) || FLOAT_MODE_P (mode) ? reg_classes_intersect_p (rclass, FP_REGS) : reg_class_subset_p (rclass, FP_REGS)) return (in ? aarch64_tune_params.memmov_cost.load_fp : aarch64_tune_params.memmov_cost.store_fp); return (in ? aarch64_tune_params.memmov_cost.load_int : aarch64_tune_params.memmov_cost.store_int); OK with that change, if it works. Thanks, Richard