This patch converts the fold-mem-offsets pass from DF to RTL-SSA.
Along with this conversion, the way the pass collects information
was completely reworked. Instead of visiting each instruction multiple
times, this is now down only once.
Most significant changes are:
* The pass operates mainly on insn_info objects from RTL-SSA.
* Single iteration over all nondebug INSNs for identification
of fold-mem-roots. Then walk of the fold-mem-roots' DEF-chain
to collect foldable constants.
* The class fold_mem_info holds vectors for the DEF-chain of
the to-be-folded INSNs (fold_agnostic_insns, which don't need
to be adjusted, and fold_insns, which need their constant to
be set to zero).
* Introduction of a single-USE mode, which only collects DEFs,
that have a single USE and therefore are safe to transform
(the fold-mem-root will be the final USE). This mode is fast
and will always run (unless disabled via -fno-fold-mem-offsets).
* Introduction of a multi-USE mode, which allows DEFs to have
multiple USEs, but all USEs must be part of any fold-mem-root's
DEF-chain. The analysis of all USEs is expensive and therefore,
this mode is disabled for highly connected CFGs. Note, that
multi-USE mode will miss some opportunities that the single-USE
mode finds (e.g. multi-USE mode fails for fold-mem-offsets-3.c).
The following testing was done:
* Bootstrapped and regtested on aarch64-linux and x86-64-linux.
* Regtested on riscv64-linux.
* SPEC CPU 2017 tested on aarch64 and riscv64-linux.
The number of applied optimizations of different versions/modes
of fold-mem-offsets in SPEC CPU2017 on RISC-V rv64gc_zba_zbb_zbs
is as follows:
Upstream:
500.perlbench_r: 169
502.gcc_r: 624
520.omnetpp_r: 1301
523.xalancbmk_r: 23
525.x264_r: 705
531.deepsjeng_r: 36
541.leela_r: 19
548.exchange2: 90
557.xz_r: 11
SUM: 2151
New single-USE:
500.perlbench_r: 70
502.gcc_r: 263
520.omnetpp_r: 1100
523.xalancbmk_r: 10
525.x264_r: 95
531.deepsjeng_r: 19
541.leela_r: 252
548.exchange2: 13
557.xz_r: 11
SUM: 1833
New multi-USE:
500.perlbench_r: 186
502.gcc_r: 744
520.omnetpp_r: 1187
523.xalancbmk_r: 22
525.x264_r: 985
531.deepsjeng_r: 21
541.leela_r: 87
548.exchange2: 63
557.xz_r: 23
SUM: 3318
New single-USE then multi-USE:
500.perlbench_r: 192
502.gcc_r: 761
520.omnetpp_r: 1673
523.xalancbmk_r: 22
525.x264_r: 995
531.deepsjeng_r: 21
541.leela_r: 252
548.exchange2: 63
557.xz_r: 23
SUM: 4002
A compile time analysis with `/bin/time -v ./install/usr/local/bin/gcc -O2
all.i`
(all.i from PR117922) shows:
* -fno-fold-mem-offsets: 289 s (user time) / 15572436 kBytes (max resident set
size)
* -ffold-mem-offsets: 339 s (user time) / 23606516 kBytes (max resident set
size)
Adding -fexpensive-optimizations to enable multi-USE mode does not have
an impact on the duration or the memory footprint.
SPEC CPU 2017 showed no significant performance impact on aarch64-linux.
gcc/ChangeLog:
PR rtl-optimization/117922
* fold-mem-offsets.cc (INCLUDE_ALGORITHM): Added definition.
(INCLUDE_FUNCTIONAL): Likewise.
(INCLUDE_ARRAY): Likewise.
(class pass_fold_mem_offsets): Moved to bottom of file.
(get_fold_mem_offset_root): Converted to RTL-SSA.
(get_single_def_in_bb): Converted to RTL-SSA.
(get_uses): New.
(has_foldable_uses_p): Converted to RTL-SSA.
(fold_offsets): Converted to RTL-SSA.
(fold_offsets_1): Converted to RTL-SSA.
(get_fold_mem_root): Removed.
(do_check_validity): New.
(do_analysis): Removed.
(insn_uses_not_in_bitmap): New.
(do_fold_info_calculation): Removed.
(drop_unsafe_candidates): New.
(do_commit_offset): Converted to RTL-SSA.
(compute_validity_closure): Removed.
(do_commit_insn): Changed to change INSN in place.
(fold_mem_offsets_single_use): New.
(fold_mem_offsets_multi_use): New.
(pass_fold_mem_offsets::execute): Moved to bottom of file.
(fold_mem_offsets): New.
Signed-off-by: Christoph Müllner <[email protected]>
---
gcc/fold-mem-offsets.cc | 1138 ++++++++++++++++++++-------------------
1 file changed, 596 insertions(+), 542 deletions(-)
diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
index c1c94472a071..0e777c32ee31 100644
--- a/gcc/fold-mem-offsets.cc
+++ b/gcc/fold-mem-offsets.cc
@@ -17,24 +17,34 @@ You should have received a copy of the GNU General Public
License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>. */
+#define INCLUDE_ALGORITHM
+#define INCLUDE_FUNCTIONAL
+#define INCLUDE_ARRAY
#include "config.h"
#include "system.h"
#include "coretypes.h"
-#include "tm.h"
+#include "backend.h"
#include "rtl.h"
+#include "rtlanal.h"
+#include "df.h"
+#include "rtl-ssa.h"
+
+#include "predict.h"
+#include "cfgrtl.h"
+#include "cfgcleanup.h"
+#include "cgraph.h"
+#include "tree-pass.h"
+#include "target.h"
+
+#include "tm.h"
#include "tree.h"
#include "expr.h"
#include "backend.h"
#include "regs.h"
-#include "target.h"
#include "memmodel.h"
#include "emit-rtl.h"
#include "insn-config.h"
#include "recog.h"
-#include "predict.h"
-#include "df.h"
-#include "tree-pass.h"
-#include "cfgrtl.h"
#include "diagnostic-core.h"
/* This pass tries to optimize memory offset calculations by moving constants
@@ -69,214 +79,248 @@ along with GCC; see the file COPYING3. If not see
allocated on the stack can result in unwanted add instructions that
cannot be eliminated easily.
- This pass works on a basic block level and consists of 4 phases:
-
- - Phase 1 (Analysis): Find "foldable" instructions.
- Foldable instructions are those that we know how to propagate
- a constant addition through (add, shift, move, ...) and only have other
- foldable instructions for uses. In that phase a DFS traversal on the
- definition tree is performed and foldable instructions are marked on
- a bitmap. The add immediate instructions that are reachable in this
- DFS are candidates for folding since all the intermediate calculations
- affected by them are also foldable.
-
- - Phase 2 (Validity): Traverse and calculate the offsets that would result
- from folding the add immediate instructions. Check whether the
- calculated offsets result in a valid instruction for the target.
-
- - Phase 3 (Commit offsets): Traverse again. It is now known which folds
- are valid so at this point change the offsets in the memory instructions.
-
- - Phase 4 (Commit instruction deletions): Scan all instructions and delete
- or simplify (reduce to move) all add immediate instructions that were
- folded.
+ The pass differentiates between the following instructions:
+
+ - fold-mem-offset root insn: loads/stores where constants will be folded
into
+ the address offset. E.g.:
+ (set (mem:DI (plus:DI (reg:DI sp) (const_int 40))) (reg:DI ra))
+ - fold-agnostic insns: instructions that may have an impact on the offset
+ calculation, but that don't require any fixup when folding. E.g.:
+ (set (reg:DI a0) (ashift:DI (reg:DI s1) (const_int 1)))
+ - fold insns: instructions that provide constants, which will be forwarded
+ into the loads/stores as offset. When folding, the constants will be
+ set to zero. E.g.:
+ (set (reg:DI s0) (plus:DI (reg:DI sp) (const_int 8)))
+
+ The pass utilizes the RTL SSA framework to get the data dependencies
+ and operates in the following phases:
+
+ - Phase 1: Iterate over all instructions to identify fold-mem-offset roots.
+ - Phase 2: Walk back along the def-chain of fold-agnostic or fold insns.
+ When successful a new offset of the fold-mem-offset is calculated
+ and a vec of fold insns that need adjustments is created.
+ - Phase 3: Drop all fold-mem-offset roots that won't accept the updated
+ offset.
+ - Phase 4: Ensure that the defs of all fold insns are used only by
+ fold-mem-offsets insns (only needed if DEFs with multiple USESs
+ are enabled via -fexpensive-optimizations).
+ - Phase 5: Update all fold-mem-offset roots and adjust the fold insns.
+
+ When we walk the DEF-chain we have two choices of operations:
+
+ - single-USE mode: Only DEFs that have exactly one USE (in the instruction
+ that we come from) are allowed. This greatly simplify the problem, but
+ also misses some cases.
+ - multi-USE mode: DEFs are allowed to have multiple USEs. E.g. a single
+ ADDI may define a value that is used by two LOADs. In this case, we need
+ to ensure that all USE-chains remain correct after we apply the folding.
+ This is done by allowing only USEs that are part of any other
+ fold-mem-offset chain in phase 4 above.
This pass should run before hard register propagation because it creates
register moves that we expect to be eliminated. */
-namespace {
-
-const pass_data pass_data_fold_mem =
-{
- RTL_PASS, /* type */
- "fold_mem_offsets", /* name */
- OPTGROUP_NONE, /* optinfo_flags */
- TV_FOLD_MEM_OFFSETS, /* tv_id */
- 0, /* properties_required */
- 0, /* properties_provided */
- 0, /* properties_destroyed */
- 0, /* todo_flags_start */
- TODO_df_finish, /* todo_flags_finish */
-};
-
-class pass_fold_mem_offsets : public rtl_opt_pass
-{
-public:
- pass_fold_mem_offsets (gcc::context *ctxt)
- : rtl_opt_pass (pass_data_fold_mem, ctxt)
- {}
-
- /* opt_pass methods: */
- virtual bool gate (function *)
- {
- return flag_fold_mem_offsets && optimize >= 2;
- }
-
- virtual unsigned int execute (function *);
-}; // class pass_fold_mem_offsets
+using namespace rtl_ssa;
/* Class that holds in FOLD_INSNS the instructions that if folded the offset
of a memory instruction would increase by ADDED_OFFSET. */
class fold_mem_info {
public:
- auto_bitmap fold_insns;
+ /* fold-mem-offset root details */
+ insn_info *insn;
+ rtx mem;
+ rtx reg;
+ HOST_WIDE_INT offset;
+
+ /* Offset that can be folded into the fold-mem-offset root. */
HOST_WIDE_INT added_offset;
+
+ /* DEF-chain INSNs. */
+ auto_vec<insn_info *> fold_agnostic_insns;
+ auto_vec<insn_info *> fold_insns;
};
-typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map;
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+ Return true iff INSN is such an instruction and return through MEM,
+ REG and OFFSET the RTX that has a MEM code, the register that is
+ used as a base address and the offset accordingly. */
+bool
+get_fold_mem_offset_root (insn_info *insn, rtx *mem, rtx *reg,
+ HOST_WIDE_INT *offset)
+{
+ rtx set = single_set (insn->rtl ());
+ if (set != NULL_RTX)
+ {
+ rtx src = SET_SRC (set);
+ rtx dest = SET_DEST (set);
-/* Tracks which instructions can be reached through instructions that can
- propagate offsets for folding. */
-static bitmap_head can_fold_insns;
+ /* Don't fold when we have unspec / volatile. */
+ if (GET_CODE (src) == UNSPEC
+ || GET_CODE (src) == UNSPEC_VOLATILE
+ || GET_CODE (dest) == UNSPEC
+ || GET_CODE (dest) == UNSPEC_VOLATILE)
+ return false;
-/* Marks instructions that are currently eligible for folding. */
-static bitmap_head candidate_fold_insns;
+ if (MEM_P (src))
+ *mem = src;
+ else if (MEM_P (dest))
+ *mem = dest;
+ else if ((GET_CODE (src) == SIGN_EXTEND
+ || GET_CODE (src) == ZERO_EXTEND)
+ && MEM_P (XEXP (src, 0)))
+ *mem = XEXP (src, 0);
+ else
+ return false;
+ }
+ else
+ return false;
-/* Tracks instructions that cannot be folded because it turned out that
- folding will result in creating an invalid memory instruction.
- An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS
- at the same time, in which case it is not legal to fold. */
-static bitmap_head cannot_fold_insns;
+ rtx mem_addr = XEXP (*mem, 0);
+ if (REG_P (mem_addr))
+ {
+ *reg = mem_addr;
+ *offset = 0;
+ }
+ else if (GET_CODE (mem_addr) == PLUS
+ && REG_P (XEXP (mem_addr, 0))
+ && CONST_INT_P (XEXP (mem_addr, 1)))
+ {
+ *reg = XEXP (mem_addr, 0);
+ *offset = INTVAL (XEXP (mem_addr, 1));
+ }
+ else
+ return false;
-/* The number of instructions that were simplified or eliminated. */
-static int stats_fold_count;
+ return true;
+}
-/* Get the single reaching definition of an instruction inside a BB.
- The definition is desired for REG used in INSN.
- Return the definition insn or NULL if there's no definition with
- the desired criteria. */
-static rtx_insn *
-get_single_def_in_bb (rtx_insn *insn, rtx reg)
+/* Get the single reaching definition of REG used in INSN within a BB.
+ Return the definition or NULL if there's no definition with the desired
+ criteria. If SINGLE_USE is set to true the DEF must have exactly one
+ USE resulting in a 1:1 DEF-USE relationship. If set to false, then a
+ 1:n DEF-USE relationship is accepted and the caller must take care to
+ ensure all USEs are safe for folding. */
+static set_info *
+get_single_def_in_bb (insn_info *insn, rtx reg, bool single_use)
{
- df_ref use;
- struct df_link *ref_chain, *ref_link;
-
- FOR_EACH_INSN_USE (use, insn)
+ /* Get the use_info of the base register. */
+ for (use_info *use : insn->uses ())
{
- if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+ /* Other USEs can be ignored and multiple equal USEs are fine. */
+ if (use->regno () != REGNO (reg))
+ continue;
+
+ /* Don't handle subregs for now. */
+ if (use->includes_subregs ())
return NULL;
- if (REGNO (DF_REF_REG (use)) == REGNO (reg))
- break;
- }
- if (!use)
- return NULL;
+ /* Get the DEF of the register. */
+ set_info *def = use->def ();
+ if (!def)
+ return NULL;
- ref_chain = DF_REF_CHAIN (use);
+ /* Limit the amount of USEs of DEF to 1. */
+ auto iter = def->nondebug_insn_uses ();
+ if (single_use && ++iter.begin () != iter.end ())
+ return NULL;
- if (!ref_chain)
- return NULL;
+ /* Don't handle multiregs for now. */
+ if (def->includes_multiregs ())
+ return NULL;
- for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
- {
- /* Problem getting some definition for this instruction. */
- if (ref_link->ref == NULL)
+ /* Only consider uses whose definition comes from a real instruction
+ and has no notes attached. */
+ insn_info *def_insn = def->insn ();
+ if (def_insn->is_artificial () || def_insn->first_note ())
return NULL;
- if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+
+ /* No parallel expressions or clobbers. */
+ if (def_insn->num_defs () != 1)
return NULL;
- if (global_regs[REGNO (reg)]
- && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+
+ rtx_insn *def_rtl = def_insn->rtl ();
+ if (!NONJUMP_INSN_P (def_rtl) || RTX_FRAME_RELATED_P (def_rtl))
return NULL;
- }
- if (ref_chain->next)
- return NULL;
+ /* Check if the DEF is a SET of the expected form. */
+ rtx def_set = simple_regno_set (PATTERN (def_rtl), def->regno ());
+ if (!def_set)
+ return NULL;
- rtx_insn *def = DF_REF_INSN (ref_chain->ref);
+ /* Ensure DEF and USE are in the same BB. */
+ if (def->bb () != insn->bb ())
+ return NULL;
- if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
- return NULL;
+ /* Ensure the definition comes before the insn. */
+ if (*use->insn () < *def_insn)
+ return NULL;
- if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
- return NULL;
+ return def;
+ }
- return def;
+ return NULL;
}
-/* Get all uses of REG which is set in INSN. Return the use list or NULL if a
- use is missing / irregular. If SUCCESS is not NULL then set it to false if
- there are missing / irregular uses and true otherwise. */
-static df_link *
-get_uses (rtx_insn *insn, rtx reg, bool *success)
+/* Test if all USEs of DEF (which defines REG) meet certain criteria to be
+ foldable. Returns true iff all USEs are fine or false otherwise. */
+static bool
+has_foldable_uses_p (set_info *def, rtx reg)
{
- df_ref def;
-
- if (success)
- *success = false;
-
- FOR_EACH_INSN_DEF (def, insn)
- if (REGNO (DF_REF_REG (def)) == REGNO (reg))
- break;
-
- if (!def)
- return NULL;
+ /* We only fold through instructions that are transitively used as
+ memory addresses and do not have other uses. Use the same logic
+ from offset calculation to visit instructions that can propagate
+ offsets and keep track of them in CAN_FOLD_INSNS. */
+ for (use_info *use : def->nondebug_insn_uses ())
+ {
+ insn_info *use_insn = use->insn ();
+ if (use_insn->is_artificial ())
+ return false;
- df_link *ref_chain = DF_REF_CHAIN (def);
- int insn_luid = DF_INSN_LUID (insn);
- basic_block insn_bb = BLOCK_FOR_INSN (insn);
+ /* Punt if the use is anything more complicated than a set
+ (clobber, use, etc). */
+ rtx_insn *use_rtl = use_insn->rtl ();
+ if (!NONJUMP_INSN_P (use_rtl) || GET_CODE (PATTERN (use_rtl)) != SET)
+ return false;
- for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next)
- {
- /* Problem getting a use for this instruction. */
- if (ref_link->ref == NULL)
- return NULL;
- if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
- return NULL;
+ /* Special case: A foldable memory store is not foldable if it
+ mentions DEST outside of the address calculation. */
+ rtx use_set = PATTERN (use_rtl);
+ if (use_set && MEM_P (SET_DEST (use_set))
+ && reg_mentioned_p (reg, SET_SRC (use_set)))
+ return false;
- rtx_insn *use = DF_REF_INSN (ref_link->ref);
- if (DEBUG_INSN_P (use))
- continue;
+ if (use->bb () != def->bb ())
+ return false;
- /* We do not handle REG_EQUIV/REG_EQ notes for now. */
- if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
- return NULL;
- if (BLOCK_FOR_INSN (use) != insn_bb)
- return NULL;
/* Punt if use appears before def in the basic block. See PR111601. */
- if (DF_INSN_LUID (use) < insn_luid)
- return NULL;
+ if (*use_insn < *def->insn ())
+ return false;
}
- if (success)
- *success = true;
-
- return ref_chain;
+ return true;
}
static HOST_WIDE_INT
-fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns);
-
-/* Helper function for fold_offsets.
+fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info, bool single_use);
- If DO_RECURSION is false and ANALYZE is true this function returns true iff
- it understands the structure of INSN and knows how to propagate constants
- through it. In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+/* Helper function for fold_offsets () that analyses the given INSN.
- If DO_RECURSION is true then it also calls fold_offsets for each recognized
- part of INSN with the appropriate arguments.
+ For INSN with known pattern, we calculate the value of the propagated
+ constant and store that in OFFSET_OUT. Foldable INSNs are added to
+ INFO->fold_insns and fold-agnostic INSNs are added to
+ INFO->fold_agnostic_insns. It is possible that some INSNs are added to
+ both lists. In this case the INSN is a fold INSN.
- If DO_RECURSION is true and ANALYZE is false then offset that would result
- from folding is computed and is returned through the pointer OFFSET_OUT.
- The instructions that can be folded are recorded in FOLDABLE_INSNS. */
+ Returns true iff the analysis was successful and false otherwise. */
static bool
-fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
- HOST_WIDE_INT *offset_out, bitmap foldable_insns)
+fold_offsets_1 (insn_info *insn, HOST_WIDE_INT *offset_out, fold_mem_info
*info,
+ bool single_use)
{
- /* Doesn't make sense if both DO_RECURSION and ANALYZE are false. */
- gcc_checking_assert (do_recursion || analyze);
- gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET);
+ bool fold_agnostic = true;
+ rtx_insn *insn_rtl = insn->rtl ();
+ gcc_checking_assert (GET_CODE (PATTERN (insn_rtl)) == SET);
- rtx src = SET_SRC (PATTERN (insn));
+ rtx src = SET_SRC (PATTERN (insn_rtl));
HOST_WIDE_INT offset = 0;
switch (GET_CODE (src))
@@ -288,35 +332,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
rtx arg2 = XEXP (src, 1);
if (REG_P (arg1))
- {
- if (do_recursion)
- offset += fold_offsets (insn, arg1, analyze, foldable_insns);
- }
+ offset += fold_offsets (insn, arg1, info, single_use);
else if (GET_CODE (arg1) == ASHIFT
&& REG_P (XEXP (arg1, 0))
&& CONST_INT_P (XEXP (arg1, 1)))
{
/* Handle R1 = (R2 << C) + ... */
- if (do_recursion)
- {
- HOST_WIDE_INT scale
- = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1)));
- offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze,
- foldable_insns);
- }
+ rtx reg = XEXP (arg1, 0);
+ rtx shamt = XEXP (arg1, 1);
+ HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt);
+ offset += scale * fold_offsets (insn, reg, info, single_use);
}
else if (GET_CODE (arg1) == PLUS
&& REG_P (XEXP (arg1, 0))
&& REG_P (XEXP (arg1, 1)))
{
/* Handle R1 = (R2 + R3) + ... */
- if (do_recursion)
+ rtx reg1 = XEXP (arg1, 0);
+ rtx reg2 = XEXP (arg1, 1);
+ if (REGNO (reg1) != REGNO (reg2))
{
- offset += fold_offsets (insn, XEXP (arg1, 0), analyze,
- foldable_insns);
- offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
- foldable_insns);
+ offset += fold_offsets (insn, reg1, info, single_use);
+ offset += fold_offsets (insn, reg2, info, single_use);
}
+ else
+ offset += 2 * fold_offsets (insn, reg1, info, single_use);
}
else if (GET_CODE (arg1) == PLUS
&& GET_CODE (XEXP (arg1, 0)) == ASHIFT
@@ -325,32 +365,31 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
&& REG_P (XEXP (arg1, 1)))
{
/* Handle R1 = ((R2 << C) + R3) + ... */
- if (do_recursion)
+ rtx reg1 = XEXP (XEXP (arg1, 0), 0);
+ rtx shamt = XEXP (XEXP (arg1, 0), 1);
+ rtx reg2 = XEXP (arg1, 1);
+ HOST_WIDE_INT scale = HOST_WIDE_INT_1U << INTVAL (shamt);
+ if (REGNO (reg1) != REGNO (reg2))
{
- HOST_WIDE_INT scale
- = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1)));
- offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0),
- analyze, foldable_insns);
- offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
- foldable_insns);
+ offset += scale * fold_offsets (insn, reg1, info, single_use);
+ offset += fold_offsets (insn, reg2, info, single_use);
}
+ else
+ offset += (scale + 1) * fold_offsets (insn, reg1, info,
+ single_use);
}
else
return false;
if (REG_P (arg2))
- {
- if (do_recursion)
- offset += fold_offsets (insn, arg2, analyze, foldable_insns);
- }
+ offset += fold_offsets (insn, arg2, info, single_use);
else if (CONST_INT_P (arg2))
{
if (REG_P (arg1))
{
offset += INTVAL (arg2);
/* This is a R1 = R2 + C instruction, candidate for folding. */
- if (!analyze)
- bitmap_set_bit (foldable_insns, INSN_UID (insn));
+ fold_agnostic = false;
}
}
else
@@ -366,26 +405,19 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
rtx arg2 = XEXP (src, 1);
if (REG_P (arg1))
- {
- if (do_recursion)
- offset += fold_offsets (insn, arg1, analyze, foldable_insns);
- }
+ offset += fold_offsets (insn, arg1, info, single_use);
else
return false;
if (REG_P (arg2))
- {
- if (do_recursion)
- offset -= fold_offsets (insn, arg2, analyze, foldable_insns);
- }
+ offset -= fold_offsets (insn, arg2, info, single_use);
else if (CONST_INT_P (arg2))
{
if (REG_P (arg1))
{
offset -= INTVAL (arg2);
/* This is a R1 = R2 - C instruction, candidate for folding. */
- if (!analyze)
- bitmap_set_bit (foldable_insns, INSN_UID (insn));
+ fold_agnostic = false;
}
}
else
@@ -399,10 +431,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
/* Propagate through negation. */
rtx arg1 = XEXP (src, 0);
if (REG_P (arg1))
- {
- if (do_recursion)
- offset = -fold_offsets (insn, arg1, analyze, foldable_insns);
- }
+ offset = -fold_offsets (insn, arg1, info, single_use);
else
return false;
@@ -417,12 +446,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
if (REG_P (arg1) && CONST_INT_P (arg2))
{
- if (do_recursion)
- {
- HOST_WIDE_INT scale = INTVAL (arg2);
- offset = scale * fold_offsets (insn, arg1, analyze,
- foldable_insns);
- }
+ HOST_WIDE_INT scale = INTVAL (arg2);
+ offset = scale * fold_offsets (insn, arg1, info, single_use);
}
else
return false;
@@ -438,12 +463,8 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
if (REG_P (arg1) && CONST_INT_P (arg2))
{
- if (do_recursion)
- {
- HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
- offset = scale * fold_offsets (insn, arg1, analyze,
- foldable_insns);
- }
+ HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
+ offset = scale * fold_offsets (insn, arg1, info, single_use);
}
else
return false;
@@ -454,8 +475,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
case REG:
{
/* Propagate through register move. */
- if (do_recursion)
- offset = fold_offsets (insn, src, analyze, foldable_insns);
+ offset = fold_offsets (insn, src, info, single_use);
/* Pattern recognized for folding. */
break;
@@ -464,8 +484,7 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
{
offset = INTVAL (src);
/* R1 = C is candidate for folding. */
- if (!analyze)
- bitmap_set_bit (foldable_insns, INSN_UID (insn));
+ fold_agnostic = false;
/* Pattern recognized for folding. */
break;
@@ -475,373 +494,395 @@ fold_offsets_1 (rtx_insn *insn, bool analyze, bool
do_recursion,
return false;
}
- if (do_recursion && !analyze)
+ if (offset_out)
*offset_out = offset;
+ if (fold_agnostic)
+ {
+ /* For single-USE mode we don't need to collect fold-agnostic INSN. */
+ if (!single_use)
+ info->fold_agnostic_insns.safe_push (insn);
+ }
+ else
+ info->fold_insns.safe_push (insn);
+
return true;
}
-/* Function that computes the offset that would have to be added to all uses
- of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated.
+/* Function that calculates the offset for INSN that would have to be added to
+ all its USEs of REG. Foldable INSNs are added to INFO->fold_insns and
+ fold-agnostic INSNs are added to INFO->fold_agnostic_insns.
+ It is possible that some INSNs are added to both lists. In this case the
+ INSN is a fold INSN.
- If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions
- transitively only affect other instructions found in CAN_FOLD_INSNS.
- If ANALYZE is false then compute the offset required for folding. */
+ Returns the offset on success or 0 if the calculation fails. */
static HOST_WIDE_INT
-fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns)
+fold_offsets (insn_info *insn, rtx reg, fold_mem_info *info,
+ bool single_use = true)
{
- rtx_insn *def = get_single_def_in_bb (insn, reg);
-
- if (!def || RTX_FRAME_RELATED_P (def) || GET_CODE (PATTERN (def)) != SET)
+ /* We can only affect the values of GPR registers. */
+ unsigned int regno = REGNO (reg);
+ if (fixed_regs[regno]
+ || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
return 0;
- rtx dest = SET_DEST (PATTERN (def));
-
- if (!REG_P (dest))
+ /* Get the DEF for REG in INSN. */
+ set_info *def = get_single_def_in_bb (insn, reg, single_use);
+ if (!def)
return 0;
- /* We can only affect the values of GPR registers. */
- unsigned int dest_regno = REGNO (dest);
- if (fixed_regs[dest_regno]
- || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
- return 0;
+ insn_info *def_insn = def->insn ();
+ rtx_insn *def_rtl = def_insn->rtl ();
- if (analyze)
+ if (dump_file && (dump_flags & TDF_DETAILS))
{
- /* Check if we know how to handle DEF. */
- if (!fold_offsets_1 (def, true, false, NULL, NULL))
- return 0;
-
- /* We only fold through instructions that are transitively used as
- memory addresses and do not have other uses. Use the same logic
- from offset calculation to visit instructions that can propagate
- offsets and keep track of them in CAN_FOLD_INSNS. */
- bool success;
- struct df_link *uses = get_uses (def, dest, &success), *ref_link;
+ fprintf (dump_file, "For INSN: ");
+ print_rtl_single (dump_file, insn->rtl ());
+ fprintf (dump_file, "...found DEF: ");
+ print_rtl_single (dump_file, def_rtl);
+ }
- if (!success)
- return 0;
+ gcc_assert (REGNO (reg) == REGNO (SET_DEST (PATTERN (def_rtl))));
- for (ref_link = uses; ref_link; ref_link = ref_link->next)
+ /* Check if all USEs of DEF are safe. */
+ if (!has_foldable_uses_p (def, reg))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
{
- rtx_insn *use = DF_REF_INSN (ref_link->ref);
-
- if (DEBUG_INSN_P (use))
- continue;
-
- /* Punt if the use is anything more complicated than a set
- (clobber, use, etc). */
- if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
- return 0;
-
- /* This use affects instructions outside of CAN_FOLD_INSNS. */
- if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
- return 0;
-
- rtx use_set = PATTERN (use);
-
- /* Special case: A foldable memory store is not foldable if it
- mentions DEST outside of the address calculation. */
- if (use_set && MEM_P (SET_DEST (use_set))
- && reg_mentioned_p (dest, SET_SRC (use_set)))
- return 0;
+ fprintf (dump_file, "has_foldable_uses_p failed for: ");
+ print_rtl_single (dump_file, def_rtl);
}
+ return 0;
+ }
- bitmap_set_bit (&can_fold_insns, INSN_UID (def));
-
+ /* Check if we know how to handle DEF. */
+ HOST_WIDE_INT offset;
+ if (!fold_offsets_1 (def_insn, &offset, info, single_use))
+ {
if (dump_file && (dump_flags & TDF_DETAILS))
{
- fprintf (dump_file, "Instruction marked for propagation: ");
- print_rtl_single (dump_file, def);
+ fprintf (dump_file, "fold_offsets_1 failed for: ");
+ print_rtl_single (dump_file, def_rtl);
}
+ return 0;
}
- else
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
{
- /* We cannot propagate through this instruction. */
- if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def)))
- return 0;
+ fprintf (dump_file, "Instruction marked for propagation: ");
+ print_rtl_single (dump_file, def_rtl);
}
- HOST_WIDE_INT offset = 0;
- bool recognized = fold_offsets_1 (def, analyze, true, &offset,
- foldable_insns);
-
- if (!recognized)
- return 0;
-
return offset;
}
-/* Test if INSN is a memory load / store that can have an offset folded to it.
- Return true iff INSN is such an instruction and return through MEM_OUT,
- REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
- used as a base address and the offset accordingly.
- All of the out pointers may be NULL in which case they will be ignored. */
-bool
-get_fold_mem_root (rtx_insn *insn, rtx *mem_out, rtx *reg_out,
- HOST_WIDE_INT *offset_out)
+/* Test if we can fold the new offset into the given fold-mem-offset root insn.
+ Returns true iff folding would succeed or false otherwise. */
+static bool
+do_check_validity (fold_mem_info *info)
{
- rtx set = single_set (insn);
- rtx mem = NULL_RTX;
-
- if (set != NULL_RTX)
- {
- rtx src = SET_SRC (set);
- rtx dest = SET_DEST (set);
-
- /* Don't fold when we have unspec / volatile. */
- if (GET_CODE (src) == UNSPEC
- || GET_CODE (src) == UNSPEC_VOLATILE
- || GET_CODE (dest) == UNSPEC
- || GET_CODE (dest) == UNSPEC_VOLATILE)
- return false;
-
- if (MEM_P (src))
- mem = src;
- else if (MEM_P (dest))
- mem = dest;
- else if ((GET_CODE (src) == SIGN_EXTEND
- || GET_CODE (src) == ZERO_EXTEND)
- && MEM_P (XEXP (src, 0)))
- mem = XEXP (src, 0);
- }
-
- if (mem == NULL_RTX)
- return false;
+ rtx_insn *insn_rtl = info->insn->rtl ();
+ rtx mem = info->mem;
+ rtx reg = info->reg;
+ HOST_WIDE_INT new_offset = info->offset + info->added_offset;
+ /* Test if it is valid to change MEM's address offset to NEW_OFFSET. */
+ int icode = INSN_CODE (insn_rtl);
+ INSN_CODE (insn_rtl) = -1;
rtx mem_addr = XEXP (mem, 0);
- rtx reg;
- HOST_WIDE_INT offset;
-
- if (REG_P (mem_addr))
- {
- reg = mem_addr;
- offset = 0;
- }
- else if (GET_CODE (mem_addr) == PLUS
- && REG_P (XEXP (mem_addr, 0))
- && CONST_INT_P (XEXP (mem_addr, 1)))
- {
- reg = XEXP (mem_addr, 0);
- offset = INTVAL (XEXP (mem_addr, 1));
- }
+ machine_mode mode = GET_MODE (mem_addr);
+ if (new_offset != 0)
+ XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
else
- return false;
+ XEXP (mem, 0) = reg;
- if (mem_out)
- *mem_out = mem;
- if (reg_out)
- *reg_out = reg;
- if (offset_out)
- *offset_out = offset;
+ bool illegal = insn_invalid_p (insn_rtl, false)
+ || !memory_address_addr_space_p (mode, XEXP (mem, 0),
+ MEM_ADDR_SPACE (mem));
- return true;
+ /* Restore the instruction. */
+ XEXP (mem, 0) = mem_addr;
+ INSN_CODE (insn_rtl) = icode;
+
+ return !illegal;
}
-/* If INSN is a root memory instruction then do a DFS traversal on its
- definitions and find folding candidates. */
-static void
-do_analysis (rtx_insn *insn)
+/* Check if any of the the provided INSNs in INSN_LIST is not marked in the
+ given bitmap. Return true if at least one INSN is not the bitmap and
+ false otherwise. */
+static bool
+insn_uses_not_in_bitmap (vec<insn_info *> *insn_list, bitmap bm)
{
- rtx reg;
- if (!get_fold_mem_root (insn, NULL, ®, NULL))
- return;
-
- if (dump_file && (dump_flags & TDF_DETAILS))
+ for (insn_info *insn : *insn_list)
{
- fprintf (dump_file, "Starting analysis from root: ");
- print_rtl_single (dump_file, insn);
+ gcc_assert (insn->num_defs () == 1);
+ set_info *def = dyn_cast<set_info *>(insn->defs ()[0]);
+ for (use_info *use : def->nondebug_insn_uses ())
+ {
+ if (!bitmap_bit_p (bm, use->insn ()->uid ()))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Cannot ensure correct transformation as "
+ "INSN %u has a USE INSN %u that was not analysed.\n",
+ insn->uid (), use->insn ()->uid ());
+ }
+
+ return true;
+ }
+ }
}
- /* Analyse folding opportunities for this memory instruction. */
- bitmap_set_bit (&can_fold_insns, INSN_UID (insn));
- fold_offsets (insn, reg, true, NULL);
+ return false;
}
+/* Check if all USEs of all instructions have been analysed.
+ If a fold_mem_info is found that has an unknown USE, then
+ drop it from the list. When this function returns, all
+ fold_mem_infos in the worklist reference instructions that
+ have been analysed before and can therefore be committed. */
static void
-do_fold_info_calculation (rtx_insn *insn, fold_info_map *fold_info)
+drop_unsafe_candidates (vec<fold_mem_info *> *worklist)
{
- rtx mem, reg;
- HOST_WIDE_INT cur_offset;
- if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
- return;
+ /* First mark all analysed INSNs in a bitmap. */
+ auto_bitmap insn_closure;
+ for (fold_mem_info *info : worklist)
+ {
+ bitmap_set_bit (insn_closure, info->insn->uid ());
+ for (insn_info *insn : info->fold_agnostic_insns)
+ bitmap_set_bit (insn_closure, insn->uid ());
+ for (insn_info *insn : info->fold_insns)
+ bitmap_set_bit (insn_closure, insn->uid ());
+ }
- fold_mem_info *info = new fold_mem_info;
- info->added_offset = fold_offsets (insn, reg, false, info->fold_insns);
+ /* Now check if all uses of fold_insns are marked. */
+ unsigned i;
+ fold_mem_info *info;
+ FOR_EACH_VEC_ELT (*worklist, i, info)
+ {
+ if (insn_uses_not_in_bitmap (&info->fold_agnostic_insns, insn_closure)
+ || insn_uses_not_in_bitmap (&info->fold_insns, insn_closure))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Dropping fold-mem-offset root INSN %u.\n",
+ info->insn->uid ());
+ }
- fold_info->put (insn, info);
+ /* Drop INFO from worklist and start over. */
+ info->fold_agnostic_insns.truncate (0);
+ info->fold_insns.truncate (0);
+ worklist->unordered_remove (i);
+ delete info;
+ drop_unsafe_candidates (worklist);
+ return;
+ }
+ }
}
-/* If INSN is a root memory instruction then compute a potentially new offset
- for it and test if the resulting instruction is valid. */
+/* Fold the new offset into the fold-mem-info root. */
static void
-do_check_validity (rtx_insn *insn, fold_mem_info *info)
+do_commit_offset (fold_mem_info *info)
{
- rtx mem, reg;
- HOST_WIDE_INT cur_offset;
- if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
- return;
+ insn_info *insn = info->insn;
+ rtx_insn *insn_rtl = insn->rtl ();
+ rtx mem = info->mem;
+ rtx reg = info->reg;
+ HOST_WIDE_INT new_offset = info->offset + info->added_offset;
- HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+ if (info->added_offset == 0)
+ return;
- /* Test if it is valid to change MEM's address offset to NEW_OFFSET. */
- int icode = INSN_CODE (insn);
- INSN_CODE (insn) = -1;
- rtx mem_addr = XEXP (mem, 0);
- machine_mode mode = GET_MODE (mem_addr);
+ machine_mode mode = GET_MODE (XEXP (mem, 0));
if (new_offset != 0)
XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
else
XEXP (mem, 0) = reg;
+ INSN_CODE (insn_rtl) = recog (PATTERN (insn_rtl), insn_rtl, 0);
+ df_insn_rescan (insn_rtl);
- bool illegal = insn_invalid_p (insn, false)
- || !memory_address_addr_space_p (mode, XEXP (mem, 0),
- MEM_ADDR_SPACE (mem));
-
- /* Restore the instruction. */
- XEXP (mem, 0) = mem_addr;
- INSN_CODE (insn) = icode;
-
- if (illegal)
- bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
- else
- bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
+ if (dump_file)
+ fprintf (dump_file, "INSN %u: Memory offset changed from "
+ HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC ".\n",
+ insn->uid (), info->offset, new_offset);
}
-static bool
-compute_validity_closure (fold_info_map *fold_info)
+/* Set the constant of the fold INSN to zero. */
+static void
+do_commit_insn (insn_info *insn)
{
- /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
- and memory operations rN that use xN as shown below. If folding x1 in r1
- turns out to be invalid for whatever reason then it's also invalid to fold
- any of the other xN into any rN. That means that we need the transitive
- closure of validity to determine whether we can fold a xN instruction.
-
- +--------------+ +-------------------+ +-------------------+
- | r1 = mem[x1] | | r2 = mem[x1 + x2] | | r3 = mem[x2 + x3] | ...
- +--------------+ +-------------------+ +-------------------+
- ^ ^ ^ ^ ^
- | / | / | ...
- | / | / |
- +-------------+ / +-------------+ / +-------------+
- | x1 = x1 + 1 |-----+ | x2 = x2 + 1 |-----+ | x3 = x3 + 1 |--- ...
- +-------------+ +-------------+ +-------------+
- ^ ^ ^
- | | |
- ... ... ...
- */
-
- /* In general three iterations should be enough for most cases, but allow up
- to five when -fexpensive-optimizations is used. */
- int max_iters = 3 + 2 * flag_expensive_optimizations;
- for (int pass = 0; pass < max_iters; pass++)
+ rtx_insn *insn_rtl = insn->rtl ();
+
+ /* If we deleted this INSNs before, then nothing left to do here. */
+ if (insn_rtl->deleted ())
+ return;
+
+ rtx set = single_set (insn_rtl);
+ rtx src = SET_SRC (set);
+
+ /* Emit a move and let subsequent passes eliminate it if possible. */
+ if (GET_CODE (src) == CONST_INT)
{
- bool made_changes = false;
- for (fold_info_map::iterator iter = fold_info->begin ();
- iter != fold_info->end (); ++iter)
+ /* Only change if necessary. */
+ if (INTVAL (src))
{
- fold_mem_info *info = (*iter).second;
- if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
- made_changes |= bitmap_ior_into (&cannot_fold_insns,
- info->fold_insns);
+ /* INSN is R1 = C. Set C to 0 because it was folded. */
+ SET_SRC (set) = CONST0_RTX (GET_MODE (SET_SRC (set)));
+ df_insn_rescan (insn_rtl);
+ }
+ }
+ else
+ {
+ /* Only change if necessary. */
+ if (INTVAL (XEXP (src, 1)))
+ {
+ /* INSN is R1 = R2 + C. Set C to 0 because it was folded. */
+ XEXP (src, 1) = CONST0_RTX (GET_MODE (XEXP (src, 1)));
+ df_insn_rescan (insn_rtl);
+
+ /* Mark self-assignments for deletion. */
+ rtx dest = SET_DEST (set);
+ if (REGNO (dest) == REGNO (XEXP (src, 0)))
+ delete_insn (insn_rtl);
}
-
- if (!made_changes)
- return true;
}
- return false;
+ if (dump_file)
+ fprintf (dump_file, "INSN %u: Constant set to zero.\n", insn->uid ());
}
-/* If INSN is a root memory instruction that was affected by any folding
- then update its offset as necessary. */
-static void
-do_commit_offset (rtx_insn *insn, fold_mem_info *info)
+/* Fold memory offsets by analysing the DEF-USE chain where the DEFs
+ will only have a single USE. */
+static unsigned int
+fold_mem_offsets_single_use ()
{
- rtx mem, reg;
- HOST_WIDE_INT cur_offset;
- if (!get_fold_mem_root (insn, &mem, ®, &cur_offset))
- return;
+ unsigned int stats_fold_count = 0;
- HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+ /* Iterate over all nondebug INSNs get our candidates and fold them. */
+ for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ()))
+ {
+ if (!insn->is_real () || !insn->can_be_optimized ())
+ continue;
- if (new_offset == cur_offset)
- return;
+ rtx mem, reg;
+ HOST_WIDE_INT offset;
+ if (!get_fold_mem_offset_root (insn, &mem, ®, &offset))
+ continue;
- gcc_assert (!bitmap_empty_p (info->fold_insns));
+ fold_mem_info info;
+ info.insn = insn;
+ info.mem = mem;
+ info.reg = reg;
+ info.offset = offset;
- if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
- return;
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+ fprintf (dump_file, "Starting analysis from root: ");
+ print_rtl_single (dump_file, info.insn->rtl ());
+ }
- if (dump_file)
- {
- fprintf (dump_file, "Memory offset changed from "
- HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC
- " for instruction:\n", cur_offset, new_offset);
- print_rtl_single (dump_file, insn);
+ /* Walk DEF-chain and collect info.fold_insns and
+ the resulting offset. */
+ info.added_offset = fold_offsets (info.insn, info.reg, &info);
+ if (info.added_offset == 0 || !do_check_validity (&info))
+ continue;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n",
+ info.added_offset);
+
+ do_commit_offset (&info);
+
+ while (!info.fold_insns.is_empty ())
+ {
+ insn_info *insn = info.fold_insns.pop ();
+ do_commit_insn (insn);
+ stats_fold_count++;
+ }
}
- machine_mode mode = GET_MODE (XEXP (mem, 0));
- if (new_offset != 0)
- XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
- else
- XEXP (mem, 0) = reg;
- INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
- df_insn_rescan (insn);
+ return stats_fold_count;
}
-/* If INSN is a move / add instruction that was folded then replace its
- constant part with zero. */
-static void
-do_commit_insn (rtx_insn *insn)
+/* Fold memory offsets by analysing the DEF-USE chain where the DEFs
+ can have multiple USE. */
+static unsigned int
+fold_mem_offsets_multi_use ()
{
- if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn))
- && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn)))
+ unsigned int stats_fold_count = 0;
+
+ /* Iterate over all nondebug INSNs and get our candidates. */
+ auto_vec<fold_mem_info *> worklist;
+ for (auto insn : iterate_safely (crtl->ssa->nondebug_insns ()))
{
- if (dump_file)
- {
- fprintf (dump_file, "Instruction folded:");
- print_rtl_single (dump_file, insn);
- }
+ if (!insn->is_real () || !insn->can_be_optimized ())
+ continue;
- stats_fold_count++;
+ rtx mem, reg;
+ HOST_WIDE_INT offset;
+ if (!get_fold_mem_offset_root (insn, &mem, ®, &offset))
+ continue;
- rtx set = single_set (insn);
- rtx dest = SET_DEST (set);
- rtx src = SET_SRC (set);
+ fold_mem_info *info = new fold_mem_info;
+ info->insn = insn;
+ info->mem = mem;
+ info->reg = reg;
+ info->offset = offset;
- /* Emit a move and let subsequent passes eliminate it if possible. */
- if (GET_CODE (src) == CONST_INT)
+ if (dump_file && (dump_flags & TDF_DETAILS))
{
- /* INSN is R1 = C.
- Replace it with R1 = 0 because C was folded. */
- rtx mov_rtx
- = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest)));
- df_insn_rescan (emit_insn_after (mov_rtx, insn));
+ fprintf (dump_file, "Starting analysis from root: ");
+ print_rtl_single (dump_file, info->insn->rtl ());
}
- else
+
+ /* Walk DEF-chain and collect info->fold_insns, info->fold_agnostic_insns
+ and the resulting offset. */
+ info->added_offset = fold_offsets (info->insn, info->reg, info, false);
+ if (info->added_offset == 0 || !do_check_validity (info))
{
- /* INSN is R1 = R2 + C.
- Replace it with R1 = R2 because C was folded. */
- rtx arg1 = XEXP (src, 0);
+ delete info;
+ continue;
+ }
- /* If the DEST == ARG1 then the move is a no-op. */
- if (REGNO (dest) != REGNO (arg1))
- {
- gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1));
- rtx mov_rtx = gen_move_insn (dest, arg1);
- df_insn_rescan (emit_insn_after (mov_rtx, insn));
- }
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Found root offset delta: " HOST_WIDE_INT_PRINT_DEC "\n",
+ info->added_offset);
+
+ /* Append candidate. */
+ worklist.safe_push (info);
+ }
+
+ /* Now drop all fold_mem_infos, which contain INSNs that have unknown
+ USEs and are therefore not safe to change. */
+ drop_unsafe_candidates (&worklist);
+
+ while (!worklist.is_empty ())
+ {
+ fold_mem_info *info = worklist.pop ();
+ do_commit_offset (info);
+
+ while (!info->fold_insns.is_empty ())
+ {
+ insn_info *insn = info->fold_insns.pop ();
+ do_commit_insn (insn);
+ stats_fold_count++;
}
- /* Delete the original move / add instruction. */
- delete_insn (insn);
+ info->fold_agnostic_insns.truncate (0);
+ delete info;
}
+
+ return stats_fold_count;
}
-unsigned int
-pass_fold_mem_offsets::execute (function *fn)
+/* Main function of fold-mem-offsets pass. */
+static unsigned int
+fold_mem_offsets (function *fn)
{
+ bool multi_use_mode = true;
+
/* Computing UD/DU chains for flow graphs which have a high connectivity
will take a long time and is unlikely to be particularly useful.
@@ -856,69 +897,82 @@ pass_fold_mem_offsets::execute (function *fn)
"fold-mem-offsets: %d basic blocks and %d edges/basic block",
n_basic_blocks_for_fn (cfun),
n_edges_for_fn (cfun) / n_basic_blocks_for_fn (cfun));
- return 0;
+ multi_use_mode = false;
}
- df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN);
- df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
+ /* There is a conflict between this pass and RISCV's shorten-memrefs
+ pass. For now disable folding if optimizing for size because
+ otherwise this cancels the effects of shorten-memrefs. */
+ cgraph_node *n = cgraph_node::get (fn->decl);
+ if (n && n->optimize_for_size_p ())
+ return 0;
+
+ /* Initialise RTL SSA. */
+ calculate_dominance_info (CDI_DOMINATORS);
df_analyze ();
+ crtl->ssa = new rtl_ssa::function_info (cfun);
- bitmap_initialize (&can_fold_insns, NULL);
- bitmap_initialize (&candidate_fold_insns, NULL);
- bitmap_initialize (&cannot_fold_insns, NULL);
+ /* The number of instructions that were simplified or eliminated. */
+ int stats_fold_count = 0;
- stats_fold_count = 0;
+ /* Fold mem offsets with DEFs that have a single USE. */
+ stats_fold_count += fold_mem_offsets_single_use ();
- basic_block bb;
- rtx_insn *insn;
- FOR_ALL_BB_FN (bb, fn)
+ /* Fold mem offsets with DEFs that have multiple USEs. */
+ if (multi_use_mode || flag_expensive_optimizations)
{
- /* There is a conflict between this pass and RISCV's shorten-memrefs
- pass. For now disable folding if optimizing for size because
- otherwise this cancels the effects of shorten-memrefs. */
- if (optimize_bb_for_size_p (bb))
- continue;
-
- fold_info_map fold_info;
+ if (dump_file)
+ fprintf (dump_file, "Starting multi-use analysis\n");
+ stats_fold_count += fold_mem_offsets_multi_use ();
+ }
- bitmap_clear (&can_fold_insns);
- bitmap_clear (&candidate_fold_insns);
- bitmap_clear (&cannot_fold_insns);
+ statistics_counter_event (cfun, "Number of folded instructions",
+ stats_fold_count);
- FOR_BB_INSNS (bb, insn)
- do_analysis (insn);
+ free_dominance_info (CDI_DOMINATORS);
+ cleanup_cfg (0);
- FOR_BB_INSNS (bb, insn)
- do_fold_info_calculation (insn, &fold_info);
+ delete crtl->ssa;
+ crtl->ssa = nullptr;
- FOR_BB_INSNS (bb, insn)
- if (fold_mem_info **info = fold_info.get (insn))
- do_check_validity (insn, *info);
+ delete_trivially_dead_insns (get_insns (), max_reg_num ());
- if (compute_validity_closure (&fold_info))
- {
- FOR_BB_INSNS (bb, insn)
- if (fold_mem_info **info = fold_info.get (insn))
- do_commit_offset (insn, *info);
+ return 0;
+}
- FOR_BB_INSNS (bb, insn)
- do_commit_insn (insn);
- }
+namespace {
- for (fold_info_map::iterator iter = fold_info.begin ();
- iter != fold_info.end (); ++iter)
- delete (*iter).second;
- }
+const pass_data pass_data_fold_mem =
+{
+ RTL_PASS, /* type */
+ "fold_mem_offsets", /* name */
+ OPTGROUP_NONE, /* optinfo_flags */
+ TV_FOLD_MEM_OFFSETS, /* tv_id */
+ 0, /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ TODO_df_finish, /* todo_flags_finish */
+};
- statistics_counter_event (cfun, "Number of folded instructions",
- stats_fold_count);
+class pass_fold_mem_offsets : public rtl_opt_pass
+{
+public:
+ pass_fold_mem_offsets (gcc::context *ctxt)
+ : rtl_opt_pass (pass_data_fold_mem, ctxt)
+ {}
- bitmap_release (&can_fold_insns);
- bitmap_release (&candidate_fold_insns);
- bitmap_release (&cannot_fold_insns);
+ /* opt_pass methods: */
+ bool gate (function *) final override
+ {
+ return flag_fold_mem_offsets && optimize >= 2;
+ }
- return 0;
-}
+ unsigned int execute (function *fn) final override
+ {
+ return fold_mem_offsets (fn);
+ }
+}; // class pass_fold_mem_offsets
} // anon namespace
--
2.49.0