On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:
From: kelefth <konstantinos.elefther...@vrull.eu>
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
The transformation is rejected on cases that would cause store_bit_field
to generate subreg expressions on different register classes.
Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain
such cases and have been marked as XFAIL.
There is a special handling for machines with BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
is false. In that case, the START parameter of store_bit_field
needs to be calculated from the end of the destination register.
gcc/ChangeLog:
* Makefile.in (OBJS): Add avoid-store-forwarding.o.
* common.opt (favoid-store-forwarding): New option.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt (store-forwarding-max-distance): New param.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
* target.def (avoid_store_forwarding_p): New DEFHOOK.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p): New function.
* targhooks.h (default_avoid_store_forwarding_p): Declare.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.
Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu>
Series-version: 8
Series-changes: 8
- Fix store_bit_field call for big-endian targets, where
BITS_BIG_ENDIAN is false.
- Handle store_forwarding_max_distance = 0 as a special case that
disables cost checks for avoid-store-forwarding.
- Update testcases for AArch64 and add testcases for x86-64.
Series-changes: 7
- Fix bug when copying back the load register, in the case that the
load is eliminated.
Series-changes: 6
- Reject the transformation on cases that would cause store_bit_field
to generate subreg expressions on different register classes.
Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
contain such cases and have been marked as XFAIL.
- Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
- Inline and remove get_load_mem.
- New implementation for is_store_forwarding.
- Refactor the main loop in avoid_store_forwarding.
- Avoid using the word 'forwardings'.
- Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
- Don't use df_insn_rescan where not needed.
- Change order of emitting stores and bit insert instructions.
- Check and reject loads for which the dest register overlaps with src.
- Remove unused variable.
- Change some gen_mov_insn function calls to gen_rtx_SET.
- Subtract the cost of eliminated load, instead of 1, for the total
cost.
- Use delete_insn instead of set_insn_deleted.
- Regenerate common.opt.urls.
- Add some more comments.
Series-changes: 5
- Fix bug with BIG_ENDIAN targets.
- Fix bug with unrecognized instructions.
- Fix / simplify pass init/fini.
Series-changes: 4
- Change pass scheduling to run after sched1.
- Add target hook to decide whether a store forwarding instance
should be avoided or not.
- Fix bugs.
Series-changes: 3
- Only emit SUBREG after calling validate_subreg.
- Fix memory corruption due to vec self-reference.
- Fix bitmap_bit_in_range_p ICE due to BLKMode.
- Reject MEM to MEM sets.
- Add get_load_mem comment.
- Add new testcase.
Series-changes: 2
- Allow modes that are not scalar_int_mode.
- Introduce simple costing to avoid unprofitable transformations.
- Reject bit insert sequences that spill to memory.
- Document new pass.
- Fix and add testcases.
---
+namespace {
+
+const pass_data pass_data_avoid_store_forwarding =
+{
+ RTL_PASS, /* type. */
+ "avoid_store_forwarding", /* name. */
+ OPTGROUP_NONE, /* optinfo_flags. */
+ TV_NONE, /* tv_id. */
+ 0, /* properties_required. */
+ 0, /* properties_provided. */
+ 0, /* properties_destroyed. */
+ 0, /* todo_flags_start. */
+ TODO_df_finish /* todo_flags_finish. */
+};
Probably want a TV entry for store forwarding. While it shouldn't be a
big time sink, we've seen other passes that should be efficient go nuts
in some cases (extension elimination being the most recent example).
+
+/* Try to modify BB so that expensive store forwarding cases are avoided. */
+
+void store_forwarding_analyzer::avoid_store_forwarding (basic_block bb)
Formatting nit. Put the return type on its own line so that the
function name always starts on column 0 of its own line.
+
+/* Update pass statistics. */
+
+void store_forwarding_analyzer::update_stats (function *fn)
Similarly.
OK with the new timevar and two formatting nits. Thanks for your
patience on this.
Jeff