On Thu, Aug 29, 2024 at 9:16 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > > On Fri, Aug 16, 2024 at 5:33 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > >> > + } > >> > + > >> > + virtual unsigned int execute (function *) override; > >> > +}; // class pass_rtl_avoid_store_forwarding > >> > + > >> > +static unsigned int stats_sf_detected = 0; > >> > +static unsigned int stats_sf_avoided = 0; > >> > >> Could you instead structure the code as a class with these as member > >> variables? I realise it's a bit over-the-top for just 2 variables, > >> but the pass might be expanded in future. > >> > > I could do that but it would require making most of the functions > > member functions (as these are incremented in > > process_store_forwarding) > > Right. > > > and by looking at other passes/GCC code this is like never done. So I > > felt that it would be a bit unconventional and I haven't implemented > > it yet. Thoughts about that? > > Most passes were written while the codebase was still C. I think newer > passes have generally been written in the way I described. Where possible, > it'd be better for new code to avoid using globals to communicate > information between functions, to reduce the work involved in any future > parallelisation effort. (There have been tentative steps towards reducing > existing uses of globals in the past.) > > Using classes also allows pass-level RAII. I realise that doesn't matter > for the current code, but like I say, we should consider future extensions > too. > Ok, sounds good, will do then.
> >> > + > >> > + FOR_EACH_VEC_ELT (stores, i, it) > >> > + { > >> > + fprintf (dump_file, "From: "); > >> > + print_rtl_single (dump_file, it->store_insn); > >> > + } > >> > + > >> > + fprintf (dump_file, "To: "); > >> > + print_rtl_single (dump_file, load_insn); > >> > + > >> > + if (load_elim) > >> > + fprintf (dump_file, "(Load elimination candidate)\n"); > >> > + } > >> > + > >> > + rtx dest; > >> > + if (load_elim) > >> > + dest = gen_reg_rtx (load_inner_mode); > >> > + else > >> > + dest = SET_DEST (load); > >> > + > >> > + int move_to_front = -1; > >> > + int total_cost = 0; > >> > + > >> > + /* Check if we can emit bit insert instructions for all forwarded > >> > stores. */ > >> > + FOR_EACH_VEC_ELT (stores, i, it) > >> > + { > >> > + it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); > >> > + rtx_insn *insns = NULL; > >> > + > >> > + /* If we're eliminating the load then find the store with zero > >> > offset > >> > + and use it as the base register to avoid a bit insert if > >> > possible. */ > >> > + if (load_elim && it->offset == 0 > >> > + && validate_subreg (GET_MODE (dest), GET_MODE (it->mov_reg), > >> > + it->mov_reg, 0)) > >> > + { > >> > + start_sequence (); > >> > + > >> > + /* We can use a paradoxical subreg to force this to a wider > >> > mode, as > >> > + the only use will be inserting the bits (i.e., we don't care > >> > about > >> > + the value of the higher bits). */ > >> > + rtx ext0 = gen_rtx_SUBREG (GET_MODE (dest), it->mov_reg, 0); > >> > >> IMO it'd be safer to call lowpart_subreg and check whether the result > >> is null. This would also remove the need to check validate_subreg > >> directly. > >> > > I did change this to lowpart_subreg, but note that validate_subreg > > looks to still be necessary (for AArch64). Otherwise I get various > > ICEs, e.g. when a SF -> TI case is hit, which was the original reason > > for validate_subreg. > > Do you have a testcase? > > Direct uses of gen_rtx_SUBREG are generally suspicious. We should be > using higher-level generators in most cases. > After going over the code I found it was my mistake in how I was calling lowpart_subreg. Fixed and now it's working fine. > > I could have some help with that, because after the new changes a > > subreg related ICE also happens within store_bit_field when a DI -> > > V4SI case is hit. Afaik store_bit_field should just return NULL if it > > can't handle something so I don't really know how to address this ICE > > in the new version. > > I don't think store_bit_field is expected to fail. But yeah, if you have > a testcase, I can take a look. > Yes, that was my initial reaction too. I don't yet have a reduced testcase, this now only happens with SPEC2017 gcc_r when compiled with `-g -O3 -march=armv8.2-a -flto=32 -favoid-store-forwarding` (doesn't reproduce without flto). I have also attached the work-in-progress patch that implements your changes up to that point and which can be used to reproduce this, at least till we have a smaller testcase. The case that leads to that ICE is this: Store forwarding detected: From: (insn 1700 1714 1701 132 (set (mem/j/c:SI (plus:DI (reg/f:DI 64 sfp) (const_int -160 [0xffffffffffffff60])) [0 +0 S4 A128]) (reg:SI 617)) "real.c":158:9 69 {*movsi_aarch64} (expr_list:REG_DEAD (reg:SI 617) (nil))) From: (insn 1695 1698 1715 132 (set (mem/c:TI (plus:DI (reg/f:DI 64 sfp) (const_int -160 [0xffffffffffffff60])) [0 MEM [(void *)&r]+0 S16 A128]) (const_int 0 [0])) "real.c":157:3 75 {*movti_aarch64} (nil)) To: (insn 1709 1703 1696 132 (set (reg:V4SI 622 [ r ]) (mem/c:V4SI (plus:DI (reg/f:DI 64 sfp) (const_int -160 [0xffffffffffffff60])) [51 r+0 S16 A128])) "builtins.c":9619:13 1270 {*aarch64_simd_movv4si} (nil)) Which is somewhat suspicious; we'll have to first make sure it's not a avoid-store-forwarding bug. In any case the ICE is builtins.c:6684:1: internal compiler error: in simplify_subreg, at simplify-rtx.cc:7680 6684 | } | ^ 0xc41ecb simplify_context::simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /home/mtsamis/src/gcc/gcc4/gcc/simplify-rtx.cc:7680 0xc42d0f simplify_context::simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /home/mtsamis/src/gcc/gcc4/gcc/simplify-rtx.cc:8007 0x8afb4b simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /home/mtsamis/src/gcc/gcc4/gcc/rtl.h:3562 0x8afb4b force_subreg(machine_mode, rtx_def*, machine_mode, poly_int<2u, unsigned long>) /home/mtsamis/src/gcc/gcc4/gcc/explow.cc:755 0x8b9ef3 store_bit_field_1 /home/mtsamis/src/gcc/gcc4/gcc/expmed.cc:810 0x8ba327 store_bit_field(rtx_def*, poly_int<2u, unsigned long>, poly_int<2u, unsigned long>, poly_int<2u, unsigned long>, poly_int<2u, unsigned long>, machine_mode, rtx_def*, bool, bool) /home/mtsamis/src/gcc/gcc4/gcc/expmed.cc:1214 0xc53c7f generate_bit_insert_sequence /home/mtsamis/src/gcc/gcc4/gcc/avoid-store-forwarding.cc:122 0xc53c7f process_store_forwarding /home/mtsamis/src/gcc/gcc4/gcc/avoid-store-forwarding.cc:231 0xc53c7f avoid_store_forwarding /home/mtsamis/src/gcc/gcc4/gcc/avoid-store-forwarding.cc:504 0xc546eb execute /home/mtsamis/src/gcc/gcc4/gcc/avoid-store-forwarding.cc:571 > >> > + rtx_insn *move0 = emit_move_insn (dest, ext0); > >> > + if (recog_memoized (move0) >= 0) > >> > + { > >> > + insns = get_insns (); > >> > + move_to_front = (int) i; > >> > + } > >> > >> Here too I'd expect emit_move_insn to produce only valid insns. > >> (And it can produce multiple insns.) > >> > > The original implementation emitted simple mov instructions without > > recog but then gradually each of these recog calls was added for ICE > > in various more obscure targets that Jeff helped find (even due to > > "simple" moves). > > By "simple mov instructions", do you mean via emit_move_insn, or by > emitting a SET directly? > The implementation used emit_move_insn from the beginning. I only added gen_rtx_SET per your suggestion. > > So, although I don't like it and it complicates things, I had to add > > recog_memoized pretty much everywhere and also make sure to only emit > > if all of these are successful... > > Do you have testcases? emit_move_insn's job is to make the move > legitimate, so it seems like the checks might be papering over an issue > elsewhere. > I had a quick look and couldn't find an existing testcase. I'll have to recheck more thoroughly in order to create a testcase (if possible at all). I remember that in one case the rtx provided was zero_extend and there was no instruction for that so recog would fail. > >> > + } > >> > + else > >> > + { > >> > + rtx reg = SET_DEST (set); > >> > + > >> > + while (GET_CODE (reg) == ZERO_EXTRACT > >> > + || GET_CODE (reg) == STRICT_LOW_PART > >> > + || GET_CODE (reg) == SUBREG) > >> > + reg = XEXP (reg, 0); > >> > + > >> > + /* Drop store forwarding candidates when the address register is > >> > + overwritten. */ > >> > + if (REG_P (reg)) > >> > + { > >> > + bool remove_rest = false; > >> > + unsigned int i; > >> > + store_fwd_info *it; > >> > + FOR_EACH_VEC_ELT_REVERSE (store_exprs, i, it) > >> > + { > >> > + if (remove_rest > >> > + || reg_overlap_mentioned_p (reg, it->store_mem)) > >> > + { > >> > + it->remove = true; > >> > + removed_count++; > >> > + remove_rest = true; > >> > + } > >> > + } > >> > + } > >> > + else > >> > + { > >> > + /* We can't understand INSN. */ > >> > + store_exprs.truncate (0); > >> > + continue; > >> > + } > >> > + } > >> > + > >> > + if (removed_count) > >> > + { > >> > + unsigned int i, j; > >> > + store_fwd_info *it; > >> > + VEC_ORDERED_REMOVE_IF (store_exprs, i, j, it, it->remove); > >> > + } > >> > >> It might be safer (and more general) to structure the code as follows: > >> > >> if (!NONDEBUG_INSN_P (insn)) > >> continue; > >> > >> vec_rtx_properties properties; > >> properties.add_insn (insn, false); > >> > >> rtx set = single_set (insn); > >> bool is_simple = (set > >> && !properties.has_asm > >> && !properties.has_side_effects ()); > >> bool is_simple_store = (is_simple > >> && MEM_P (SET_DEST (set)) > >> && !contains_mem_rtx_p (SET_SRC (set))); > >> rtx load_mem = nullptr; > >> bool is_simple_load = (is_simple > >> && !contains_mem_rtx_p (SET_DEST (set)) > >> && (load_mem = get_load_mem (set))); > >> if (is_simple_store) > >> ...record store... > >> > >> bool reads_mem = false; > >> bool writes_mem = false; > >> for (auto ref : properties.refs ()) > >> if (ref.is_mem ()) > >> { > >> reads_mem |= ref.is_read (); > >> writes_mem |= ref.is_write (); > >> } > >> else if (ref.is_write ()) > >> { > >> // Check for the last store to mention ref.regno (), use > >> // block_remove to remove it and all preceding stores. > >> } > >> > >> if (is_simple_load) > >> { > >> // Try store forwarding. Remove the last store that > >> // load_mem might depend on, and all previous stores. > >> } > >> > >> if ((writes_mem && !is_simple_store) > >> || (reads_mem && !is_simple_load)) > >> store_exprs.truncate (0); > >> > >> It doesn't need to be exactly like that, but the idea is to test > >> directly for all references to memory and registers, without giving > >> up unnecessarily on (say) const calls or arithmetic double-sets. > >> > > Hmm, I have restructured the code based on your other feedback and it > > indeed looks more like this (except for not using vec_rtx_properties). > > I'll evaluate if it makes it better to further make it more like that > > (or use vec_rtx_properties); I'm a bit nervous to make a very large > > change there, but if it improves things I'll do it. > > Thanks. I'd appreciate if you could use vec_rtx_properties, since it > is designed for cases like this. IMO: > Alright, will do. > if (CALL_P (insn) || !set || volatile_refs_p (set) > || GET_MODE (SET_DEST (set)) == BLKmode > || (MEM_P (SET_DEST (set)) && MEM_P (SET_SRC (set)))) > { > store_exprs.truncate (0); > continue; > } > > and the later continue are too conservative. In particular, > we shouldn't treat all non-single_sets as optimisation barriers, > since many of them are fine. > Fair enough. Thanks, Manolis > Richard
From 957ec5e9049a94b6fd81bff2c9f9a11aacfbe067 Mon Sep 17 00:00:00 2001 From: Manolis Tsamis <manolis.tsa...@vrull.eu> Date: Fri, 26 Apr 2024 11:53:23 +0200 Subject: [PATCH] Target-independent store forwarding avoidance. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This pass detects cases of expensive store forwarding and tries to avoid them by reordering the stores and using suitable bit insertion sequences. For example it can transform this: strb w2, [x1, 1] ldr x0, [x1] # Expensive store forwarding to larger load. To: ldr x0, [x1] strb w2, [x1] bfi x0, x2, 0, 8 Assembly like this can appear with bitfields or type punning / unions. On stress-ng when running the cpu-union microbenchmark the following speedups have been observed. Neoverse-N1: +29.4% Intel Coffeelake: +13.1% AMD 5950X: +17.5% gcc/ChangeLog: * Makefile.in: Add avoid-store-forwarding.o * common.opt: New option -favoid-store-forwarding. * doc/invoke.texi: New param store-forwarding-max-distance. * doc/passes.texi: Document new pass. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Document new pass. * params.opt: New param store-forwarding-max-distance. * passes.def: Schedule a new pass. * target.def (HOOK_PREFIX): New target hook avoid_store_forwarding_p. * target.h (struct store_fwd_info): Declare. * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. * avoid-store-forwarding.cc: New file. * avoid-store-forwarding.h: New file. gcc/testsuite/ChangeLog: * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. Series-version: 6 Series-changes: 6 - Use optimize_bb_for_speed_p instead of optimize_insn_for_speed_p. - Inline and remove get_load_mem. - New implementation for is_store_forwarding. - Refactor the main loop in avoid_store_forwarding. - Avoid using the word 'forwardings'. - Use lowpart_subreg instead of validate_subreg + gen_rtx_subreg. - Don't use df_insn_rescan where not needed. - Change order of emitting stores and bit insert instructions. - Check and reject loads for which the dest register overlaps with src. - Add some more comments. Series-changes: 5 - Fix bug with BIG_ENDIAN targets. - Fix bug with unrecognized instructions. - Fix / simplify pass init/fini. Series-changes: 4 - Change pass scheduling to run after sched1. - Add target hook to decide whether a store forwarding instance should be avoided or not. - Fix bugs. Series-changes: 3 - Only emit SUBREG after calling validate_subreg. - Fix memory corruption due to vec self-reference. - Fix bitmap_bit_in_range_p ICE due to BLKMode. - Reject MEM to MEM sets. - Add get_load_mem comment. - Add new testcase. Series-changes: 2 - Allow modes that are not scalar_int_mode. - Introduce simple costing to avoid unprofitable transformations. - Reject bit insert sequences that spill to memory. - Document new pass. - Fix and add testcases. Series-to: gcc-patches@gcc.gnu.org Series-cc: Philipp Tomsich <philipp.toms...@vrull.eu> Series-cc: Jiangning Liu <jiangning....@amperecomputing.com> Series-cc: Jakub Jelinek <ja...@redhat.com> Series-cc: Andrew Pinski <quic_apin...@quicinc.com> Series-cc: Richard Biener <rguent...@suse.de> Series-cc: Christoph Müllner <christoph.muell...@vrull.eu> --- gcc/Makefile.in | 1 + gcc/avoid-store-forwarding.cc | 590 ++++++++++++++++++ gcc/avoid-store-forwarding.h | 56 ++ gcc/common.opt | 4 + gcc/doc/invoke.texi | 9 + gcc/doc/passes.texi | 8 + gcc/doc/tm.texi | 8 + gcc/doc/tm.texi.in | 2 + gcc/params.opt | 4 + gcc/passes.def | 1 + gcc/target.def | 10 + gcc/target.h | 3 + gcc/targhooks.cc | 25 + gcc/targhooks.h | 3 + .../aarch64/avoid-store-forwarding-1.c | 28 + .../aarch64/avoid-store-forwarding-2.c | 39 ++ .../aarch64/avoid-store-forwarding-3.c | 31 + .../aarch64/avoid-store-forwarding-4.c | 23 + .../aarch64/avoid-store-forwarding-5.c | 38 ++ gcc/tree-pass.h | 1 + 20 files changed, 884 insertions(+) create mode 100644 gcc/avoid-store-forwarding.cc create mode 100644 gcc/avoid-store-forwarding.h create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-5.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 68fda1a7591..4e1fa9e94d4 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1682,6 +1682,7 @@ OBJS = \ statistics.o \ stmt.o \ stor-layout.o \ + avoid-store-forwarding.o \ store-motion.o \ streamer-hooks.o \ stringpool.o \ diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc new file mode 100644 index 00000000000..2e508647ec5 --- /dev/null +++ b/gcc/avoid-store-forwarding.cc @@ -0,0 +1,590 @@ +/* Avoid store forwarding optimization pass. + Copyright (C) 2024 Free Software Foundation, Inc. + Contributed by VRULL GmbH. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#include "avoid-store-forwarding.h" +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "target.h" +#include "rtl.h" +#include "alias.h" +#include "rtlanal.h" +#include "cfgrtl.h" +#include "tree-pass.h" +#include "cselib.h" +#include "predict.h" +#include "insn-config.h" +#include "expmed.h" +#include "recog.h" +#include "regset.h" +#include "df.h" +#include "expr.h" +#include "memmodel.h" +#include "emit-rtl.h" +#include "vec.h" + +/* This pass tries to detect and avoid cases of store forwarding. + On many processors there is a large penalty when smaller stores are + forwarded to larger loads. The idea used to avoid the stall is to move + the store after the load and in addition emit a bit insert sequence so + the load register has the correct value. For example the following: + + strb w2, [x1, 1] + ldr x0, [x1] + + Will be transformed to: + + ldr x0, [x1] + strb w2, [x1] + bfi x0, x2, 0, 8 +*/ + +namespace { + +const pass_data pass_data_avoid_store_forwarding = +{ + RTL_PASS, /* type. */ + "avoid_store_forwarding", /* name. */ + OPTGROUP_NONE, /* optinfo_flags. */ + TV_NONE, /* tv_id. */ + 0, /* properties_required. */ + 0, /* properties_provided. */ + 0, /* properties_destroyed. */ + 0, /* todo_flags_start. */ + TODO_df_finish /* todo_flags_finish. */ +}; + +class pass_rtl_avoid_store_forwarding : public rtl_opt_pass +{ +public: + pass_rtl_avoid_store_forwarding (gcc::context *ctxt) + : rtl_opt_pass (pass_data_avoid_store_forwarding, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return flag_avoid_store_forwarding && optimize >= 1; + } + + virtual unsigned int execute (function *) override; +}; // class pass_rtl_avoid_store_forwarding + +static unsigned int stats_sf_detected = 0; +static unsigned int stats_sf_avoided = 0; + +/* Return true iff a store to STORE_MEM would write to a sub-region of bytes + from what LOAD_MEM would read. If true also store the relative byte offset + of the store within the load to OFF_VAL. */ + +static bool +is_store_forwarding (rtx store_mem, rtx load_mem, HOST_WIDE_INT *off_val) +{ + poly_int64 load_offset, store_offset; + rtx load_base = strip_offset (XEXP (load_mem, 0), &load_offset); + rtx store_base = strip_offset (XEXP (store_mem, 0), &store_offset); + return (MEM_SIZE (load_mem).is_constant () + && rtx_equal_p (load_base, store_base) + && known_subrange_p (store_offset, MEM_SIZE (store_mem), + load_offset, MEM_SIZE (load_mem)) + && (store_offset - load_offset).is_constant (off_val)); +} + +/* Return a bit insertion sequence that would make DEST have the correct value + if the store represented by STORE_INFO were to be moved after DEST. */ + +static rtx_insn * +generate_bit_insert_sequence (store_fwd_info *store_info, rtx dest) +{ + unsigned HOST_WIDE_INT store_size + = MEM_SIZE (store_info->store_mem).to_constant (); + + start_sequence (); + + store_bit_field (dest, store_size * BITS_PER_UNIT, + store_info->offset * BITS_PER_UNIT, 0, 0, + GET_MODE (dest), store_info->mov_reg, + false, false); + + rtx_insn *insns = get_insns (); + unshare_all_rtl_in_chain (insns); + end_sequence (); + + for (rtx_insn *insn = insns; insn; insn = NEXT_INSN (insn)) + if (contains_mem_rtx_p (PATTERN (insn)) + || recog_memoized (insn) < 0) + return NULL; + + return insns; +} + +/* Given a list of small stores that are forwarded to LOAD_INSN, try to + rearrange them so that a store-forwarding penalty doesn't occur. + The stores must be given in reverse program order, starting from the + one closer to LOAD_INSN. */ + +static bool +process_store_forwarding (vec<store_fwd_info> &stores, rtx_insn *load_insn, + rtx load_mem) +{ + machine_mode load_mem_mode = GET_MODE (load_mem); + HOST_WIDE_INT load_size = MEM_SIZE (load_mem).to_constant (); + + /* If the stores cover all the bytes of the load without overlap then we can + eliminate the load entirely and use the computed value instead. */ + + sbitmap forwarded_bytes = sbitmap_alloc (load_size); + bitmap_clear (forwarded_bytes); + + unsigned int i; + store_fwd_info* it; + FOR_EACH_VEC_ELT (stores, i, it) + { + HOST_WIDE_INT store_size = MEM_SIZE (it->store_mem).to_constant (); + if (bitmap_bit_in_range_p (forwarded_bytes, it->offset, + it->offset + store_size - 1)) + break; + bitmap_set_range (forwarded_bytes, it->offset, store_size); + } + + bitmap_not (forwarded_bytes, forwarded_bytes); + bool load_elim = bitmap_empty_p (forwarded_bytes); + + stats_sf_detected++; + + if (dump_file) + { + fprintf (dump_file, "Store forwarding detected:\n"); + + FOR_EACH_VEC_ELT (stores, i, it) + { + fprintf (dump_file, "From: "); + print_rtl_single (dump_file, it->store_insn); + } + + fprintf (dump_file, "To: "); + print_rtl_single (dump_file, load_insn); + + if (load_elim) + fprintf (dump_file, "(Load elimination candidate)\n"); + } + + rtx load = single_set (load_insn); + rtx dest; + + if (load_elim) + dest = gen_reg_rtx (load_mem_mode); + else + dest = SET_DEST (load); + + int move_to_front = -1; + int total_cost = 0; + + /* Check if we can emit bit insert instructions for all forwarded stores. */ + FOR_EACH_VEC_ELT (stores, i, it) + { + it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem)); + rtx_insn *insns = NULL; + + /* If we're eliminating the load then find the store with zero offset + and use it as the base register to avoid a bit insert if possible. */ + if (load_elim && it->offset == 0) + { + start_sequence (); + + /* We can use a paradoxical subreg to force this to a wider mode, as + the only use will be inserting the bits (i.e., we don't care about + the value of the higher bits). */ + rtx ext0 = lowpart_subreg (GET_MODE (dest), it->mov_reg, + GET_MODE (it->mov_reg)); + if (ext0) + { + rtx_insn *move0 = emit_move_insn (dest, ext0); + if (recog_memoized (move0) >= 0) + { + insns = get_insns (); + move_to_front = (int) i; + } + } + + end_sequence (); + } + + if (!insns) + insns = generate_bit_insert_sequence (&(*it), dest); + + if (!insns) + { + if (dump_file) + { + fprintf (dump_file, "Failed due to: "); + print_rtl_single (dump_file, it->store_insn); + } + return false; + } + + total_cost += seq_cost (insns, true); + it->bits_insert_insns = insns; + + rtx store_set = single_set (it->store_insn); + + /* Create a register move at the store's original position to save the + stored value. */ + start_sequence (); + rtx_insn *insn1 + = emit_insn (gen_rtx_SET (it->mov_reg, SET_SRC (store_set))); + end_sequence (); + + if (recog_memoized (insn1) < 0) + { + if (dump_file) + { + fprintf (dump_file, "Failed due to unrecognizable insn: "); + print_rtl_single (dump_file, insn1); + } + return false; + } + + it->save_store_value_insn = insn1; + + /* Create a new store after the load with the saved original value. + This avoids the forwarding stall. */ + start_sequence (); + rtx_insn *insn2 + = emit_insn (gen_rtx_SET (SET_DEST (store_set), it->mov_reg)); + end_sequence (); + + if (recog_memoized (insn2) < 0) + { + if (dump_file) + { + fprintf (dump_file, "Failed due to unrecognizable insn: "); + print_rtl_single (dump_file, insn2); + } + return false; + } + + it->store_saved_value_insn = insn2; + } + + if (load_elim) + total_cost -= insn_cost (load_insn, true); + + /* Let the target decide if transforming this store forwarding instance is + profitable. */ + if (!targetm.avoid_store_forwarding_p (stores, load_mem, total_cost, + load_elim)) + { + if (dump_file) + fprintf (dump_file, "Not transformed due to target decision.\n"); + + return false; + } + + /* If we have a move instead of bit insert, it needs to be emitted first in + the resulting sequence. */ + if (move_to_front != -1) + { + store_fwd_info copy = stores[move_to_front]; + stores.safe_push (copy); + stores.ordered_remove (move_to_front); + } + + if (load_elim) + { + machine_mode outer_mode = GET_MODE (SET_DEST (load)); + rtx_code extend = ZERO_EXTEND; + if (outer_mode != load_mem_mode) + extend = GET_CODE (SET_SRC (load)); + + start_sequence (); + rtx load_value = simplify_gen_unary (extend, outer_mode, dest, + load_mem_mode); + rtx load_move = gen_move_insn (SET_DEST (load), load_value); + rtx_insn *insn = emit_insn (load_move); + rtx_insn *seq = get_insns (); + end_sequence (); + + if (recog_memoized (insn) < 0) + return false; + + emit_insn_after (seq, load_insn); + } + + if (dump_file) + { + fprintf (dump_file, "Store forwarding avoided with bit inserts:\n"); + + FOR_EACH_VEC_ELT (stores, i, it) + { + if (stores.length () > 1) + { + fprintf (dump_file, "For: "); + print_rtl_single (dump_file, it->store_insn); + } + + fprintf (dump_file, "With sequence:\n"); + + for (rtx_insn *insn = it->bits_insert_insns; insn; + insn = NEXT_INSN (insn)) + { + fprintf (dump_file, " "); + print_rtl_single (dump_file, insn); + } + } + } + + stats_sf_avoided++; + + /* Done, emit all the generated instructions and delete the stores. + Note that STORES are in reverse program order. */ + + FOR_EACH_VEC_ELT (stores, i, it) + { + emit_insn_after (it->bits_insert_insns, load_insn); + emit_insn_after (it->store_saved_value_insn, load_insn); + } + + FOR_EACH_VEC_ELT (stores, i, it) + { + emit_insn_before (it->save_store_value_insn, it->store_insn); + delete_insn (it->store_insn); + } + + df_insn_rescan (load_insn); + + if (load_elim) + delete_insn (load_insn); + + return true; +} + +/* Try to modify BB so that expensive store forwarding cases are avoided. */ + +static void +avoid_store_forwarding (basic_block bb) +{ + if (!optimize_bb_for_speed_p (bb)) + return; + + auto_vec<store_fwd_info, 8> store_exprs; + rtx_insn *insn; + unsigned int insn_cnt = 0; + + FOR_BB_INSNS (bb, insn) + { + if (!NONDEBUG_INSN_P (insn)) + continue; + + rtx set = single_set (insn); + + /* Store forwarding issues are unlikely if we cross a call. + Clear store forwarding candidates if we can't deal with INSN. */ + if (CALL_P (insn) || !set || volatile_refs_p (set) + || GET_MODE (SET_DEST (set)) == BLKmode + || (MEM_P (SET_DEST (set)) && MEM_P (SET_SRC (set)))) + { + store_exprs.truncate (0); + continue; + } + + /* The inner mem RTX if INSN is a load, NULL_RTX otherwise. */ + rtx load_mem = SET_SRC (set); + + if (GET_CODE (load_mem) == ZERO_EXTEND + || GET_CODE (load_mem) == SIGN_EXTEND) + load_mem = XEXP (load_mem, 0); + + if (!MEM_P (load_mem)) + load_mem = NULL_RTX; + + /* The mem RTX if INSN is a store, NULL_RTX otherwise. */ + rtx store_mem = MEM_P (SET_DEST (set)) ? SET_DEST (set) : NULL_RTX; + + /* We cannot analyze memory RTXs that have unknown size. + Bail if a load writes to a register that overlaps with the address + as we cannot handle that later. */ + if ((store_mem && (!MEM_SIZE_KNOWN_P (store_mem) + || !MEM_SIZE (store_mem).is_constant ())) + || (load_mem && (!MEM_SIZE_KNOWN_P (load_mem) + || !MEM_SIZE (load_mem).is_constant () + || reg_overlap_mentioned_p (SET_DEST (set), + load_mem)))) + { + store_exprs.truncate (0); + continue; + } + + int removed_count = 0; + + if (store_mem) + { + /* Record store forwarding candidate. */ + store_fwd_info info; + info.store_insn = insn; + info.store_mem = store_mem; + info.insn_cnt = insn_cnt; + info.remove = false; + info.forwarded = false; + store_exprs.safe_push (info); + } + else if (load_mem) + { + /* Process load for possible store forwarding cases. */ + auto_vec<store_fwd_info> forwardings; + bool partial_forwarding = false; + bool remove_rest = false; + + unsigned int i; + store_fwd_info *it; + FOR_EACH_VEC_ELT_REVERSE (store_exprs, i, it) + { + rtx store_mem = it->store_mem; + HOST_WIDE_INT off_val; + + if (remove_rest) + { + it->remove = true; + removed_count++; + } + else if (is_store_forwarding (store_mem, load_mem, &off_val)) + { + /* Check if moving this store after the load is legal. */ + bool write_dep = false; + for (unsigned int j = store_exprs.length () - 1; j != i; j--) + if (!store_exprs[j].forwarded + && output_dependence (store_mem, + store_exprs[j].store_mem)) + { + write_dep = true; + break; + } + + if (!write_dep) + { + it->forwarded = true; + it->offset = off_val; + forwardings.safe_push (*it); + } + else + partial_forwarding = true; + + it->remove = true; + removed_count++; + } + else if (true_dependence (store_mem, GET_MODE (store_mem), + load_mem)) + { + /* We cannot keep a store forwarding candidate if it possibly + interferes with this load. */ + it->remove = true; + removed_count++; + remove_rest = true; + } + } + + if (!forwardings.is_empty () && !partial_forwarding) + process_store_forwarding (forwardings, insn, load_mem); + } + else + { + rtx reg = SET_DEST (set); + + while (GET_CODE (reg) == ZERO_EXTRACT + || GET_CODE (reg) == STRICT_LOW_PART + || GET_CODE (reg) == SUBREG) + reg = XEXP (reg, 0); + + /* Drop store forwarding candidates when the address register is + overwritten. */ + if (REG_P (reg)) + { + bool remove_rest = false; + unsigned int i; + store_fwd_info *it; + FOR_EACH_VEC_ELT_REVERSE (store_exprs, i, it) + { + if (remove_rest + || reg_overlap_mentioned_p (reg, it->store_mem)) + { + it->remove = true; + removed_count++; + remove_rest = true; + } + } + } + else + { + /* We can't understand INSN. */ + store_exprs.truncate (0); + continue; + } + } + + if (removed_count) + { + unsigned int i, j; + store_fwd_info *it; + VEC_ORDERED_REMOVE_IF (store_exprs, i, j, it, it->remove); + } + + /* Don't consider store forwarding if the RTL instruction distance is + more than PARAM_STORE_FORWARDING_MAX_DISTANCE. */ + if (!store_exprs.is_empty () + && (store_exprs[0].insn_cnt + + param_store_forwarding_max_distance <= insn_cnt)) + store_exprs.ordered_remove (0); + + insn_cnt++; + } +} + +unsigned int +pass_rtl_avoid_store_forwarding::execute (function *fn) +{ + df_set_flags (DF_DEFER_INSN_RESCAN); + + init_alias_analysis (); + + stats_sf_detected = 0; + stats_sf_avoided = 0; + + basic_block bb; + FOR_EACH_BB_FN (bb, fn) + avoid_store_forwarding (bb); + + end_alias_analysis (); + + statistics_counter_event (fn, "Cases of store forwarding detected: ", + stats_sf_detected); + statistics_counter_event (fn, "Cases of store forwarding avoided: ", + stats_sf_detected); + + return 0; +} + +} // anon namespace. + +rtl_opt_pass * +make_pass_rtl_avoid_store_forwarding (gcc::context *ctxt) +{ + return new pass_rtl_avoid_store_forwarding (ctxt); +} diff --git a/gcc/avoid-store-forwarding.h b/gcc/avoid-store-forwarding.h new file mode 100644 index 00000000000..55a0c97f008 --- /dev/null +++ b/gcc/avoid-store-forwarding.h @@ -0,0 +1,56 @@ +/* Avoid store forwarding optimization pass. + Copyright (C) 2024 Free Software Foundation, Inc. + Contributed by VRULL GmbH. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#ifndef GCC_AVOID_STORE_FORWARDING_H +#define GCC_AVOID_STORE_FORWARDING_H + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" + +struct store_fwd_info +{ + /* The store instruction that is a store forwarding candidate. */ + rtx_insn *store_insn; + /* SET_DEST (single_set (store_insn)). */ + rtx store_mem; + /* The temporary that will hold the stored value at the original store + position. */ + rtx mov_reg; + /* The instruction sequence that inserts the stored value's bits at the + appropriate position in the loaded value. */ + rtx_insn *bits_insert_insns; + /* An instruction that saves the store's value in a register temporarily, + (set (reg X) (SET_SRC (store_insn))). */ + rtx_insn *save_store_value_insn; + /* An instruction that stores the saved value back to memory, + (set (SET_DEST (store_insn)) (reg X)). */ + rtx_insn *store_saved_value_insn; + /* The byte offset for the store's position within the load. */ + HOST_WIDE_INT offset; + + unsigned int insn_cnt; + bool remove; + bool forwarded; +}; + +#endif /* GCC_AVOID_STORE_FORWARDING_H */ diff --git a/gcc/common.opt b/gcc/common.opt index ea39f87ae71..7d080fbc513 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1751,6 +1751,10 @@ fgcse-sm Common Var(flag_gcse_sm) Init(0) Optimization Perform store motion after global common subexpression elimination. +favoid-store-forwarding +Common Var(flag_avoid_store_forwarding) Init(0) Optimization +Try to avoid store forwarding. + fgcse-las Common Var(flag_gcse_las) Init(0) Optimization Perform redundant load after store elimination in global common subexpression diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1dede8234d3..c09f65455b2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12798,6 +12798,15 @@ loop unrolling. This option is enabled by default at optimization levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}. +@opindex favoid-store-forwarding +@item -favoid-store-forwarding +@itemx -fno-avoid-store-forwarding +Many CPUs will stall for many cycles when a load partially depends on previous +smaller stores. This pass tries to detect such cases and avoid the penalty by +changing the order of the load and store and then fixing up the loaded value. + +Disabled by default. + @opindex ffp-contract @item -ffp-contract=@var{style} @option{-ffp-contract=off} disables floating-point expression contraction. diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi index 4ac7a2306a1..639f6b325c8 100644 --- a/gcc/doc/passes.texi +++ b/gcc/doc/passes.texi @@ -925,6 +925,14 @@ and addressing mode selection. The pass is run twice, with values being propagated into loops only on the second run. The code is located in @file{fwprop.cc}. +@item Store forwarding avoidance + +This pass attempts to reduce the overhead of store to load forwarding. +It detects when a load reads from one or more previous smaller stores and +then rearranges them so that the stores are done after the load. The loaded +value is adjusted with a series of bit insert instructions so that it stays +the same. The code is located in @file{avoid-store-forwarding.cc}. + @item Common subexpression elimination This pass removes redundant computation within basic blocks, and diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index cc33084ed32..58ce52fab3e 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -7348,6 +7348,14 @@ the @code{POLY_VALUE_MIN}, @code{POLY_VALUE_MAX} and implementation returns the lowest possible value of @var{val}. @end deftypefn +@deftypefn {Target Hook} bool TARGET_AVOID_STORE_FORWARDING_P (vec<store_fwd_info>, @var{rtx}, @var{int}, @var{bool}) +Given a list of stores and a load instruction that reads from the location +of the stores, this hook decides if it's profitable to emit additional code +to avoid a potential store forwarding stall. The additional instructions +needed, the sequence cost and additional relevant information is given in +the arguments so that the target can make an informed decision. +@end deftypefn + @node Scheduling @section Adjusting the Instruction Scheduler diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 8af3f414505..73ae1a1b8fe 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4745,6 +4745,8 @@ Define this macro if a non-short-circuit operation produced by @hook TARGET_ESTIMATED_POLY_VALUE +@hook TARGET_AVOID_STORE_FORWARDING_P + @node Scheduling @section Adjusting the Instruction Scheduler diff --git a/gcc/params.opt b/gcc/params.opt index c17ba17b91b..84a32b28b4a 100644 --- a/gcc/params.opt +++ b/gcc/params.opt @@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned stores if it is legal to do Common Joined UInteger Var(param_store_merging_max_size) Init(65536) IntegerRange(1, 65536) Param Optimization Maximum size of a single store merging region in bytes. +-param=store-forwarding-max-distance= +Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10) IntegerRange(1, 1000) Param Optimization +Maximum number of instruction distance that a small store forwarded to a larger load may stall. + -param=switch-conversion-max-branch-ratio= Common Joined UInteger Var(param_switch_conversion_branch_ratio) Init(8) IntegerRange(1, 65536) Param Optimization The maximum ratio between array size and switch branches for a switch conversion to take place. diff --git a/gcc/passes.def b/gcc/passes.def index b06d6d45f63..7f7fd96e336 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -506,6 +506,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_sms); NEXT_PASS (pass_live_range_shrinkage); NEXT_PASS (pass_sched); + NEXT_PASS (pass_rtl_avoid_store_forwarding); NEXT_PASS (pass_early_remat); NEXT_PASS (pass_ira); NEXT_PASS (pass_reload); diff --git a/gcc/target.def b/gcc/target.def index 1d0ea6f30ca..60fc725cb56 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6959,6 +6959,16 @@ HOOK_VECTOR_END (shrink_wrap) #undef HOOK_PREFIX #define HOOK_PREFIX "TARGET_" +DEFHOOK +(avoid_store_forwarding_p, + "Given a list of stores and a load instruction that reads from the location\n\ +of the stores, this hook decides if it's profitable to emit additional code\n\ +to avoid a potential store forwarding stall. The additional instructions\n\ +needed, the sequence cost and additional relevant information is given in\n\ +the arguments so that the target can make an informed decision.", + bool, (vec<store_fwd_info>, rtx, int, bool), + default_avoid_store_forwarding_p) + /* Determine the type of unwind info to emit for debugging. */ DEFHOOK (debug_unwind_info, diff --git a/gcc/target.h b/gcc/target.h index 837651d273a..2f59f3c6a80 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -165,6 +165,9 @@ class function_arg_info; /* This is defined in function-abi.h. */ class predefined_function_abi; +/* This is defined in avoid-store-forwarding.h . */ +struct store_fwd_info; + /* These are defined in tree-vect-stmts.cc. */ extern tree stmt_vectype (class _stmt_vec_info *); extern bool stmt_in_inner_loop_p (class vec_info *, class _stmt_vec_info *); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index dc040df9fcd..19927815312 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "options.h" #include "case-cfn-macros.h" +#include "avoid-store-forwarding.h" bool default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED, @@ -2281,6 +2282,30 @@ default_class_max_nregs (reg_class_t rclass ATTRIBUTE_UNUSED, #endif } +/* The default implementation of TARGET_AVOID_STORE_FORWARDING_P. */ + +bool +default_avoid_store_forwarding_p (vec<store_fwd_info>, rtx, int total_cost, + bool) +{ + /* Use a simple cost heurstic base on param_store_forwarding_max_distance. + In general the distance should be somewhat correlated to the store + forwarding penalty; if the penalty is large then it is justified to + increase the window size. Use this to reject sequences that are clearly + unprofitable. */ + int max_cost = COSTS_N_INSNS (param_store_forwarding_max_distance / 2); + if (total_cost > max_cost) + { + if (dump_file) + fprintf (dump_file, "Not transformed due to cost: %d > %d.\n", + total_cost, max_cost); + + return false; + } + + return true; +} + /* Determine the debugging unwind mechanism for the target. */ enum unwind_info_type diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 2704d6008f1..12889544158 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -254,6 +254,9 @@ extern reg_class_t default_preferred_rename_class (reg_class_t rclass); extern bool default_class_likely_spilled_p (reg_class_t); extern unsigned char default_class_max_nregs (reg_class_t, machine_mode); +extern bool default_avoid_store_forwarding_p (vec<store_fwd_info>, rtx, int, + bool); + extern enum unwind_info_type default_debug_unwind_info (void); extern void default_canonicalize_comparison (int *, rtx *, rtx *, bool); diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c new file mode 100644 index 00000000000..4712f101d5e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-1.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -favoid-store-forwarding -fdump-rtl-avoid_store_forwarding" } */ + +typedef union { + char arr_8[8]; + long long_value; +} DataUnion; + +long ssll_1 (DataUnion *data, char x) +{ + data->arr_8[0] = x; + return data->long_value; +} + +long ssll_2 (DataUnion *data, char x) +{ + data->arr_8[1] = x; + return data->long_value; +} + +long ssll_3 (DataUnion *data, char x) +{ + data->arr_8[7] = x; + return data->long_value; +} + +/* { dg-final { scan-rtl-dump-times "Store forwarding detected" 3 "avoid_store_forwarding" } } */ +/* { dg-final { scan-rtl-dump-times "Store forwarding avoided" 3 "avoid_store_forwarding" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c new file mode 100644 index 00000000000..b958612173b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-2.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -favoid-store-forwarding -fdump-rtl-avoid_store_forwarding" } */ + +typedef union { + char arr_8[8]; + int long_value; +} DataUnion1; + +long no_ssll_1 (DataUnion1 *data, char x) +{ + data->arr_8[4] = x; + return data->long_value; +} + +long no_ssll_2 (DataUnion1 *data, char x) +{ + data->arr_8[5] = x; + return data->long_value; +} + +typedef union { + char arr_8[8]; + short long_value[4]; +} DataUnion2; + +long no_ssll_3 (DataUnion2 *data, char x) +{ + data->arr_8[4] = x; + return data->long_value[1]; +} + +long no_ssll_4 (DataUnion2 *data, char x) +{ + data->arr_8[0] = x; + return data->long_value[1]; +} + +/* { dg-final { scan-rtl-dump-times "Store forwarding detected" 0 "avoid_store_forwarding" } } */ +/* { dg-final { scan-rtl-dump-times "Store forwarding avoided" 0 "avoid_store_forwarding" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c new file mode 100644 index 00000000000..28191f5aa39 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-3.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -favoid-store-forwarding -fdump-rtl-avoid_store_forwarding" } */ + +typedef union { + char arr_8[8]; + long long_value; +} DataUnion; + +long ssll_multi_1 (DataUnion *data, char x) +{ + data->arr_8[0] = x; + data->arr_8[2] = x; + return data->long_value; +} + +long ssll_multi_2 (DataUnion *data, char x) +{ + data->arr_8[0] = x; + data->arr_8[1] = 11; + return data->long_value; +} + +long ssll_multi_3 (DataUnion *data, char x, short y) +{ + data->arr_8[1] = x; + __builtin_memcpy(data->arr_8 + 4, &y, sizeof(short)); + return data->long_value; +} + +/* { dg-final { scan-rtl-dump-times "Store forwarding detected" 3 "avoid_store_forwarding" } } */ +/* { dg-final { scan-rtl-dump-times "Store forwarding avoided" 3 "avoid_store_forwarding" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c new file mode 100644 index 00000000000..db3c6b6630b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -favoid-store-forwarding -fdump-rtl-avoid_store_forwarding" } */ + +typedef int v4si __attribute__ ((vector_size (16))); + +typedef union { + char arr_16[16]; + v4si vec_value; +} DataUnion; + +v4si ssll_vect_1 (DataUnion *data, char x) +{ + data->arr_16[0] = x; + return data->vec_value; +} + +v4si ssll_vect_2 (DataUnion *data, int x) +{ + __builtin_memcpy(data->arr_16 + 4, &x, sizeof(int)); + return data->vec_value; +} + +/* { dg-final { scan-rtl-dump-times "Store forwarding detected" 2 "avoid_store_forwarding" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-5.c b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-5.c new file mode 100644 index 00000000000..87a7de676d0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/avoid-store-forwarding-5.c @@ -0,0 +1,38 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -favoid-store-forwarding -fdump-rtl-avoid_store_forwarding" } */ + +typedef float v4f __attribute__ ((vector_size (16))); + +typedef union { + float arr_2[4]; + long long_value; + __int128 longlong_value; + v4f vec_value; +} DataUnion; + +long ssll_load_elim_1 (DataUnion *data, float x) +{ + data->arr_2[0] = x; + data->arr_2[1] = 0.0f; + return data->long_value; +} + +__int128 ssll_load_elim_2 (DataUnion *data, float x) +{ + data->arr_2[0] = x; + data->arr_2[1] = 0.0f; + data->arr_2[2] = x; + data->arr_2[3] = 0.0f; + return data->longlong_value; +} + +v4f ssll_load_elim_3 (DataUnion *data, float x) +{ + data->arr_2[3] = x; + data->arr_2[2] = x; + data->arr_2[1] = x; + data->arr_2[0] = x; + return data->vec_value; +} + +/* { dg-final { scan-rtl-dump-times "Store forwarding detected" 3 "avoid_store_forwarding" } } */ diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 3a0cf13089e..75b1d9cbae4 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -572,6 +572,7 @@ extern rtl_opt_pass *make_pass_rtl_dse3 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_cprop (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_pre (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_hoist (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_rtl_avoid_store_forwarding (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_store_motion (gcc::context *ctxt); extern rtl_opt_pass *make_pass_cse_after_global_opts (gcc::context *ctxt); extern rtl_opt_pass *make_pass_rtl_ifcvt (gcc::context *ctxt); -- 2.46.0