Pushed to master with the following fixups: - new timevar added - nits addressed - whitespace fixes
Philipp. On Mon, 25 Nov 2024 at 03:30, Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote: > > From: kelefth <konstantinos.elefther...@vrull.eu> > > > > This pass detects cases of expensive store forwarding and tries to avoid > > them > > by reordering the stores and using suitable bit insertion sequences. > > For example it can transform this: > > > > strb w2, [x1, 1] > > ldr x0, [x1] # Expensive store forwarding to larger load. > > > > To: > > > > ldr x0, [x1] > > strb w2, [x1] > > bfi x0, x2, 0, 8 > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +29.4% > > Intel Coffeelake: +13.1% > > AMD 5950X: +17.5% > > > > The transformation is rejected on cases that would cause store_bit_field > > to generate subreg expressions on different register classes. > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain > > such cases and have been marked as XFAIL. > > > > There is a special handling for machines with BITS_BIG_ENDIAN != > > BYTES_BIG_ENDIAN. The need for this came up from an issue in H8 > > architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN > > is false. In that case, the START parameter of store_bit_field > > needs to be calculated from the end of the destination register. > > > > gcc/ChangeLog: > > > > * Makefile.in (OBJS): Add avoid-store-forwarding.o. > > * common.opt (favoid-store-forwarding): New option. > > * common.opt.urls: Regenerate. > > * doc/invoke.texi: New param store-forwarding-max-distance. > > * doc/passes.texi: Document new pass. > > * doc/tm.texi: Regenerate. > > * doc/tm.texi.in: Document new pass. > > * params.opt (store-forwarding-max-distance): New param. > > * passes.def: Add pass_rtl_avoid_store_forwarding before > > pass_early_remat. > > * target.def (avoid_store_forwarding_p): New DEFHOOK. > > * target.h (struct store_fwd_info): Declare. > > * targhooks.cc (default_avoid_store_forwarding_p): New function. > > * targhooks.h (default_avoid_store_forwarding_p): Declare. > > * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. > > * avoid-store-forwarding.cc: New file. > > * avoid-store-forwarding.h: New file. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. > > * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. > > * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test. > > * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New > > test. > > > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu> > > Signed-off-by: Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu> > > > > Series-version: 8 > > > > Series-changes: 8 > > - Fix store_bit_field call for big-endian targets, where > > BITS_BIG_ENDIAN is false. > > - Handle store_forwarding_max_distance = 0 as a special case that > > disables cost checks for avoid-store-forwarding. > > - Update testcases for AArch64 and add testcases for x86-64. > > > > Series-changes: 7 > > - Fix bug when copying back the load register, in the case that the > > load is eliminated. > > > > Series-changes: 6 > > - Reject the transformation on cases that would cause store_bit_field > > to generate subreg expressions on different register classes. > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c > > contain such cases and have been marked as XFAIL. > > - Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p. > > - Inline and remove get_load_mem. > > - New implementation for is_store_forwarding. > > - Refactor the main loop in avoid_store_forwarding. > > - Avoid using the word 'forwardings'. > > - Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg. > > - Don't use df_insn_rescan where not needed. > > - Change order of emitting stores and bit insert instructions. > > - Check and reject loads for which the dest register overlaps with > > src. > > - Remove unused variable. > > - Change some gen_mov_insn function calls to gen_rtx_SET. > > - Subtract the cost of eliminated load, instead of 1, for the total > > cost. > > - Use delete_insn instead of set_insn_deleted. > > - Regenerate common.opt.urls. > > - Add some more comments. > > > > Series-changes: 5 > > - Fix bug with BIG_ENDIAN targets. > > - Fix bug with unrecognized instructions. > > - Fix / simplify pass init/fini. > > > > Series-changes: 4 > > - Change pass scheduling to run after sched1. > > - Add target hook to decide whether a store forwarding instance > > should be avoided or not. > > - Fix bugs. > > > > Series-changes: 3 > > - Only emit SUBREG after calling validate_subreg. > > - Fix memory corruption due to vec self-reference. > > - Fix bitmap_bit_in_range_p ICE due to BLKMode. > > - Reject MEM to MEM sets. > > - Add get_load_mem comment. > > - Add new testcase. > > > > Series-changes: 2 > > - Allow modes that are not scalar_int_mode. > > - Introduce simple costing to avoid unprofitable transformations. > > - Reject bit insert sequences that spill to memory. > > - Document new pass. > > - Fix and add testcases. > > --- > > > +namespace { > > + > > +const pass_data pass_data_avoid_store_forwarding = > > +{ > > + RTL_PASS, /* type. */ > > + "avoid_store_forwarding", /* name. */ > > + OPTGROUP_NONE, /* optinfo_flags. */ > > + TV_NONE, /* tv_id. */ > > + 0, /* properties_required. */ > > + 0, /* properties_provided. */ > > + 0, /* properties_destroyed. */ > > + 0, /* todo_flags_start. */ > > + TODO_df_finish /* todo_flags_finish. */ > > +}; > Probably want a TV entry for store forwarding. While it shouldn't be a > big time sink, we've seen other passes that should be efficient go nuts > in some cases (extension elimination being the most recent example). > > > > > > + > > +/* Try to modify BB so that expensive store forwarding cases are avoided. > > */ > > + > > +void store_forwarding_analyzer::avoid_store_forwarding (basic_block bb) > > Formatting nit. Put the return type on its own line so that the > function name always starts on column 0 of its own line. > > > > > + > > +/* Update pass statistics. */ > > + > > +void store_forwarding_analyzer::update_stats (function *fn) > Similarly. > > > OK with the new timevar and two formatting nits. Thanks for your > patience on this. > > Jeff > >