On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote:
From: kelefth <konstantinos.elefther...@vrull.eu>

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

      strb    w2, [x1, 1]
      ldr     x0, [x1]      # Expensive store forwarding to larger load.

To:

      ldr     x0, [x1]
      strb    w2, [x1]
      bfi     x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:      +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:        +17.5%

The transformation is rejected on cases that would cause store_bit_field
to generate subreg expressions on different register classes.
Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain
such cases and have been marked as XFAIL.

There is a special handling for machines with BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN. The need for this came up from an issue in H8
architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN
is false. In that case, the START parameter of store_bit_field
needs to be calculated from the end of the destination register.

gcc/ChangeLog:

        * Makefile.in (OBJS): Add avoid-store-forwarding.o.
        * common.opt (favoid-store-forwarding): New option.
        * common.opt.urls: Regenerate.
        * doc/invoke.texi: New param store-forwarding-max-distance.
        * doc/passes.texi: Document new pass.
        * doc/tm.texi: Regenerate.
        * doc/tm.texi.in: Document new pass.
        * params.opt (store-forwarding-max-distance): New param.
        * passes.def: Add pass_rtl_avoid_store_forwarding before
        pass_early_remat.
        * target.def (avoid_store_forwarding_p): New DEFHOOK.
        * target.h (struct store_fwd_info): Declare.
        * targhooks.cc (default_avoid_store_forwarding_p): New function.
        * targhooks.h (default_avoid_store_forwarding_p): Declare.
        * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
        * avoid-store-forwarding.cc: New file.
        * avoid-store-forwarding.h: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
        * gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
        * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
         * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.

Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu>
Signed-off-by: Konstantinos Eleftheriou <konstantinos.elefther...@vrull.eu>

Series-version: 8

Series-changes: 8
        - Fix store_bit_field call for big-endian targets, where
          BITS_BIG_ENDIAN is false.
        - Handle store_forwarding_max_distance = 0 as a special case that
          disables cost checks for avoid-store-forwarding.
        - Update testcases for AArch64 and add testcases for x86-64.

Series-changes: 7
        - Fix bug when copying back the load register, in the case that the
          load is eliminated.

Series-changes: 6
        - Reject the transformation on cases that would cause store_bit_field
          to generate subreg expressions on different register classes.
          Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c
           contain such cases and have been marked as XFAIL.
        - Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p.
        - Inline and remove get_load_mem.
        - New implementation for is_store_forwarding.
        - Refactor the main loop in avoid_store_forwarding.
        - Avoid using the word 'forwardings'.
        - Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg.
        - Don't use df_insn_rescan where not needed.
        - Change order of emitting stores and bit insert instructions.
        - Check and reject loads for which the dest register overlaps with src.
        - Remove unused variable.
        - Change some gen_mov_insn function calls to gen_rtx_SET.
        - Subtract the cost of eliminated load, instead of 1, for the total 
cost.
        - Use delete_insn instead of set_insn_deleted.
        - Regenerate common.opt.urls.
        - Add some more comments.

Series-changes: 5
        - Fix bug with BIG_ENDIAN targets.
        - Fix bug with unrecognized instructions.
        - Fix / simplify pass init/fini.

Series-changes: 4
        - Change pass scheduling to run after sched1.
        - Add target hook to decide whether a store forwarding instance
          should be avoided or not.
        - Fix bugs.

Series-changes: 3
        - Only emit SUBREG after calling validate_subreg.
        - Fix memory corruption due to vec self-reference.
        - Fix bitmap_bit_in_range_p ICE due to BLKMode.
        - Reject MEM to MEM sets.
        - Add get_load_mem comment.
        - Add new testcase.

Series-changes: 2
        - Allow modes that are not scalar_int_mode.
        - Introduce simple costing to avoid unprofitable transformations.
        - Reject bit insert sequences that spill to memory.
        - Document new pass.
        - Fix and add testcases.
---

+namespace {
+
+const pass_data pass_data_avoid_store_forwarding =
+{
+  RTL_PASS, /* type.  */
+  "avoid_store_forwarding", /* name.  */
+  OPTGROUP_NONE, /* optinfo_flags.  */
+  TV_NONE, /* tv_id.  */
+  0, /* properties_required.  */
+  0, /* properties_provided.  */
+  0, /* properties_destroyed.  */
+  0, /* todo_flags_start.  */
+  TODO_df_finish /* todo_flags_finish.  */
+};
Probably want a TV entry for store forwarding. While it shouldn't be a big time sink, we've seen other passes that should be efficient go nuts in some cases (extension elimination being the most recent example).




+
+/* Try to modify BB so that expensive store forwarding cases are avoided.  */
+
+void store_forwarding_analyzer::avoid_store_forwarding (basic_block bb)

Formatting nit. Put the return type on its own line so that the function name always starts on column 0 of its own line.



+
+/* Update pass statistics.  */
+
+void store_forwarding_analyzer::update_stats (function *fn)
Similarly.


OK with the new timevar and two formatting nits. Thanks for your patience on this.

Jeff


Reply via email to