On Wed, Aug 20, 2025 at 1:05 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > I'd added the aarch64-specific CC fusion pass to fold a PTEST > instruction into the instruction that feeds the PTEST, in cases > where the latter instruction can set the appropriate flags as a > side-effect. > > Combine does the same optimisation. However, as explained in the > comments, the PTEST case often has: > > A: set predicate P based on inputs X > B: clobber X > C: test P > > and so the fusion is only possible if we move C before B. > That's something that combine currently can't do (for the cases > that we needed). > > The optimisation was never really AArch64-specific. It's just that, > in an all-too-familiar fashion, we needed it in stage 3, when it was > too late to add something target-independent. > > late-combine adds a convenient place to do the optimisation in a > target-independent way, just as combine is a convenient place to > do its related optimisation. > > Tested on aarch64-linux-gnu, powerpc64le-linux-gnu and > x86_64-linux-gnu. OK to install?
OK. Thanks, Richard. > Richard > > > gcc/ > * config.gcc (aarch64*-*-*): Remove aarch64-cc-fusion.o from > extra_objs. > * config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete. > * config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete. > * config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete. > * config/aarch64/aarch64-cc-fusion.cc: Delete. > * late-combine.cc (late_combine::optimizable_set): Take a set_info * > rather than an insn_info * and move destination tests from... > (late_combine::combine_into_uses): ...here. Take a set_info * rather > an insn_info *. Take the rtx set. > (late_combine::parallelize_insns, late_combine::combine_cc_setter) > (late_combine::combine_insn): New member functions. > (late_combine::m_parallel): New member variable. > * rtlanal.cc (pattern_cost): Handle sets of CC registers in the > same way as comparisons. > --- > gcc/config.gcc | 2 +- > gcc/config/aarch64/aarch64-cc-fusion.cc | 297 ------------------------ > gcc/config/aarch64/aarch64-passes.def | 1 - > gcc/config/aarch64/aarch64-protos.h | 1 - > gcc/config/aarch64/t-aarch64 | 6 - > gcc/late-combine.cc | 243 ++++++++++++++++--- > gcc/rtlanal.cc | 3 +- > 7 files changed, 208 insertions(+), 345 deletions(-) > delete mode 100644 gcc/config/aarch64/aarch64-cc-fusion.cc > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index 56246387ef5..517df40e5de 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -351,7 +351,7 @@ aarch64*-*-*) > c_target_objs="aarch64-c.o" > cxx_target_objs="aarch64-c.o" > d_target_objs="aarch64-d.o" > - extra_objs="aarch64-builtins.o aarch-common.o aarch64-elf-metadata.o > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o > aarch-bti-insert.o aarch64-cc-fusion.o aarch64-early-ra.o > aarch64-ldp-fusion.o" > + extra_objs="aarch64-builtins.o aarch-common.o aarch64-elf-metadata.o > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o > aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o > aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o > aarch-bti-insert.o aarch64-early-ra.o aarch64-ldp-fusion.o" > target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h > \$(srcdir)/config/aarch64/aarch64-builtins.h > \$(srcdir)/config/aarch64/aarch64-builtins.cc > \$(srcdir)/config/aarch64/aarch64-sve-builtins.h > \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" > target_has_targetm_common=yes > ;; > diff --git a/gcc/config/aarch64/aarch64-cc-fusion.cc > b/gcc/config/aarch64/aarch64-cc-fusion.cc > deleted file mode 100644 > index cea54dee298..00000000000 > --- a/gcc/config/aarch64/aarch64-cc-fusion.cc > +++ /dev/null > @@ -1,297 +0,0 @@ > -// Pass to fuse CC operations with other instructions. > -// Copyright (C) 2021-2025 Free Software Foundation, Inc. > -// > -// This file is part of GCC. > -// > -// GCC is free software; you can redistribute it and/or modify it under > -// the terms of the GNU General Public License as published by the Free > -// Software Foundation; either version 3, or (at your option) any later > -// version. > -// > -// GCC is distributed in the hope that it will be useful, but WITHOUT ANY > -// WARRANTY; without even the implied warranty of MERCHANTABILITY or > -// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > -// for more details. > -// > -// You should have received a copy of the GNU General Public License > -// along with GCC; see the file COPYING3. If not see > -// <http://www.gnu.org/licenses/>. > - > -// This pass looks for sequences of the form: > -// > -// A: (set (reg R1) X1) > -// B: ...instructions that might change the value of X1... > -// C: (set (reg CC) X2) // X2 uses R1 > -// > -// and tries to change them to: > -// > -// C': [(set (reg CC) X2') > -// (set (reg R1) X1)] > -// B: ...instructions that might change the value of X1... > -// > -// where X2' is the result of replacing R1 with X1 in X2. > -// > -// This sequence occurs in SVE code in two important cases: > -// > -// (a) Sometimes, to deal correctly with overflow, we need to increment > -// an IV after a WHILELO rather than before it. In this case: > -// - A is a WHILELO, > -// - B includes an IV increment and > -// - C is a separate PTEST. > -// > -// (b) ACLE code of the form: > -// > -// svbool_t ok = svrdffr (); > -// if (svptest_last (pg, ok)) > -// ... > -// > -// must, for performance reasons, be code-generated as: > -// > -// RDFFRS Pok.B, Pg/Z > -// ...branch on flags result... > -// > -// without a separate PTEST of Pok. In this case: > -// - A is an aarch64_rdffr > -// - B includes an aarch64_update_ffrt > -// - C is a separate PTEST > -// > -// Combine can handle this optimization if B doesn't exist and if A and > -// C are in the same BB. This pass instead handles cases where B does > -// exist and cases where A and C are in different BBs of the same EBB. > - > -#define IN_TARGET_CODE 1 > - > -#define INCLUDE_ALGORITHM > -#define INCLUDE_FUNCTIONAL > -#define INCLUDE_ARRAY > -#include "config.h" > -#include "system.h" > -#include "coretypes.h" > -#include "backend.h" > -#include "rtl.h" > -#include "df.h" > -#include "rtl-ssa.h" > -#include "tree-pass.h" > - > -using namespace rtl_ssa; > - > -namespace { > -const pass_data pass_data_cc_fusion = > -{ > - RTL_PASS, // type > - "cc_fusion", // name > - OPTGROUP_NONE, // optinfo_flags > - TV_NONE, // tv_id > - 0, // properties_required > - 0, // properties_provided > - 0, // properties_destroyed > - 0, // todo_flags_start > - TODO_df_finish, // todo_flags_finish > -}; > - > -// Class that represents one run of the pass. > -class cc_fusion > -{ > -public: > - cc_fusion () : m_parallel () {} > - void execute (); > - > -private: > - rtx optimizable_set (const insn_info *); > - bool parallelize_insns (def_info *, rtx, def_info *, rtx); > - void optimize_cc_setter (def_info *, rtx); > - > - // A spare PARALLEL rtx, or null if none. > - rtx m_parallel; > -}; > - > -// See whether INSN is a single_set that we can optimize. Return the > -// set if so, otherwise return null. > -rtx > -cc_fusion::optimizable_set (const insn_info *insn) > -{ > - if (!insn->can_be_optimized () > - || insn->is_asm () > - || insn->has_volatile_refs () > - || insn->has_pre_post_modify ()) > - return NULL_RTX; > - > - return single_set (insn->rtl ()); > -} > - > -// CC_SET is a single_set that sets (only) CC_DEF; OTHER_SET is likewise > -// a single_set that sets (only) OTHER_DEF. CC_SET is known to set the > -// CC register and the instruction that contains CC_SET is known to use > -// OTHER_DEF. Try to do CC_SET and OTHER_SET in parallel. > -bool > -cc_fusion::parallelize_insns (def_info *cc_def, rtx cc_set, > - def_info *other_def, rtx other_set) > -{ > - auto attempt = crtl->ssa->new_change_attempt (); > - > - insn_info *cc_insn = cc_def->insn (); > - insn_info *other_insn = other_def->insn (); > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "trying to parallelize insn %d and insn %d\n", > - other_insn->uid (), cc_insn->uid ()); > - > - // Try to substitute OTHER_SET into CC_INSN. > - insn_change_watermark rtl_watermark; > - rtx_insn *cc_rtl = cc_insn->rtl (); > - insn_propagation prop (cc_rtl, SET_DEST (other_set), > - SET_SRC (other_set)); > - if (!prop.apply_to_pattern (&PATTERN (cc_rtl)) > - || prop.num_replacements == 0) > - { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "-- failed to substitute all uses of r%d\n", > - other_def->regno ()); > - return false; > - } > - > - // Restrict the uses to those outside notes. > - use_array cc_uses = remove_note_accesses (attempt, cc_insn->uses ()); > - use_array other_set_uses = remove_note_accesses (attempt, > - other_insn->uses ()); > - > - // Remove the use of the substituted value. > - access_array_builder uses_builder (attempt); > - uses_builder.reserve (cc_uses.size ()); > - for (use_info *use : cc_uses) > - if (use->def () != other_def) > - uses_builder.quick_push (use); > - cc_uses = use_array (uses_builder.finish ()); > - > - // Get the list of uses for the new instruction. > - insn_change cc_change (cc_insn); > - cc_change.new_uses = merge_access_arrays (attempt, other_set_uses, > cc_uses); > - if (!cc_change.new_uses.is_valid ()) > - { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "-- cannot merge uses\n"); > - return false; > - } > - > - // The instruction initially defines just two registers. recog can add > - // extra clobbers if necessary. > - auto_vec<access_info *, 2> new_defs; > - new_defs.quick_push (cc_def); > - new_defs.quick_push (other_def); > - sort_accesses (new_defs); > - cc_change.new_defs = def_array (access_array (new_defs)); > - > - // Make sure there is somewhere that the new instruction could live. > - auto other_change = insn_change::delete_insn (other_insn); > - insn_change *changes[] = { &other_change, &cc_change }; > - cc_change.move_range = cc_insn->ebb ()->insn_range (); > - if (!restrict_movement (cc_change, ignore_changing_insns (changes))) > - { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "-- cannot satisfy all definitions and uses\n"); > - return false; > - } > - > - // Tentatively install the new pattern. By convention, the CC set > - // must be first. > - if (m_parallel) > - { > - XVECEXP (m_parallel, 0, 0) = cc_set; > - XVECEXP (m_parallel, 0, 1) = other_set; > - } > - else > - { > - rtvec vec = gen_rtvec (2, cc_set, other_set); > - m_parallel = gen_rtx_PARALLEL (VOIDmode, vec); > - } > - validate_change (cc_rtl, &PATTERN (cc_rtl), m_parallel, 1); > - > - // These routines report failures themselves. > - if (!recog (attempt, cc_change, ignore_changing_insns (changes)) > - || !changes_are_worthwhile (changes) > - || !crtl->ssa->verify_insn_changes (changes)) > - return false; > - > - remove_reg_equal_equiv_notes (cc_rtl); > - confirm_change_group (); > - crtl->ssa->change_insns (changes); > - m_parallel = NULL_RTX; > - return true; > -} > - > -// Try to optimize the instruction that contains CC_DEF, where CC_DEF > describes > -// a definition of the CC register by CC_SET. > -void > -cc_fusion::optimize_cc_setter (def_info *cc_def, rtx cc_set) > -{ > - // Search the registers used by the CC setter for an easily-substitutable > - // def-use chain. > - for (use_info *other_use : cc_def->insn ()->uses ()) > - if (def_info *other_def = other_use->def ()) > - if (other_use->regno () != CC_REGNUM > - && other_def->ebb () == cc_def->ebb ()) > - if (rtx other_set = optimizable_set (other_def->insn ())) > - { > - rtx dest = SET_DEST (other_set); > - if (REG_P (dest) > - && REGNO (dest) == other_def->regno () > - && REG_NREGS (dest) == 1 > - && parallelize_insns (cc_def, cc_set, other_def, other_set)) > - return; > - } > -} > - > -// Run the pass on the current function. > -void > -cc_fusion::execute () > -{ > - // Initialization. > - calculate_dominance_info (CDI_DOMINATORS); > - df_analyze (); > - crtl->ssa = new rtl_ssa::function_info (cfun); > - > - // Walk through all instructions that set CC. Look for a PTEST instruction > - // that we can optimize. > - // > - // ??? The PTEST test isn't needed for correctness, but it ensures that the > - // pass no effect on non-SVE code. > - for (def_info *def : crtl->ssa->reg_defs (CC_REGNUM)) > - if (rtx cc_set = optimizable_set (def->insn ())) > - if (REG_P (SET_DEST (cc_set)) > - && REGNO (SET_DEST (cc_set)) == CC_REGNUM > - && GET_CODE (SET_SRC (cc_set)) == UNSPEC > - && XINT (SET_SRC (cc_set), 1) == UNSPEC_PTEST) > - optimize_cc_setter (def, cc_set); > - > - // Finalization. > - crtl->ssa->perform_pending_updates (); > - free_dominance_info (CDI_DOMINATORS); > -} > - > -class pass_cc_fusion : public rtl_opt_pass > -{ > -public: > - pass_cc_fusion (gcc::context *ctxt) > - : rtl_opt_pass (pass_data_cc_fusion, ctxt) > - {} > - > - // opt_pass methods: > - virtual bool gate (function *) { return TARGET_SVE && optimize >= 2; } > - virtual unsigned int execute (function *); > -}; > - > -unsigned int > -pass_cc_fusion::execute (function *) > -{ > - cc_fusion ().execute (); > - return 0; > -} > - > -} // end namespace > - > -// Create a new CC fusion pass instance. > - > -rtl_opt_pass * > -make_pass_cc_fusion (gcc::context *ctxt) > -{ > - return new pass_cc_fusion (ctxt); > -} > diff --git a/gcc/config/aarch64/aarch64-passes.def > b/gcc/config/aarch64/aarch64-passes.def > index 9cf9d3e13b2..6a53ff35591 100644 > --- a/gcc/config/aarch64/aarch64-passes.def > +++ b/gcc/config/aarch64/aarch64-passes.def > @@ -24,6 +24,5 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, > pass_track_speculation); > INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, > pass_switch_pstate_sm); > INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, > pass_late_track_speculation); > INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); > -INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); > INSERT_PASS_BEFORE (pass_early_remat, 1, pass_ldp_fusion); > INSERT_PASS_BEFORE (pass_peephole2, 1, pass_ldp_fusion); > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index d26e1d5642e..56efcf2c7f2 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1237,7 +1237,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *); > rtl_opt_pass *make_pass_track_speculation (gcc::context *); > rtl_opt_pass *make_pass_late_track_speculation (gcc::context *); > rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt); > -rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt); > rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt); > rtl_opt_pass *make_pass_ldp_fusion (gcc::context *); > > diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 > index 38a8c063725..63ca8e90c9d 100644 > --- a/gcc/config/aarch64/t-aarch64 > +++ b/gcc/config/aarch64/t-aarch64 > @@ -190,12 +190,6 @@ aarch-bti-insert.o: > $(srcdir)/config/arm/aarch-bti-insert.cc \ > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > $(srcdir)/config/arm/aarch-bti-insert.cc > > -aarch64-cc-fusion.o: $(srcdir)/config/aarch64/aarch64-cc-fusion.cc \ > - $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \ > - $(RTL_SSA_H) tree-pass.h > - $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > - $(srcdir)/config/aarch64/aarch64-cc-fusion.cc > - > aarch64-early-ra.o: $(srcdir)/config/aarch64/aarch64-early-ra.cc \ > $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \ > $(RTL_SSA_H) tree-pass.h > diff --git a/gcc/late-combine.cc b/gcc/late-combine.cc > index 90d7ef09583..770780eb04d 100644 > --- a/gcc/late-combine.cc > +++ b/gcc/late-combine.cc > @@ -17,9 +17,16 @@ > // along with GCC; see the file COPYING3. If not see > // <http://www.gnu.org/licenses/>. > > -// The current purpose of this pass is to substitute definitions into > -// all uses, so that the definition can be removed. However, it could > -// be extended to handle other combination-related optimizations in future. > +// This pass currently has two purposes: > +// > +// - to substitute definitions into all uses, so that the definition > +// can be removed. > +// > +// - to try to parallelise sets of condition-code registers with a > +// related instruction (see combine_cc_setter for details). > +// > +// However, it could be extended to handle other combination-related > +// optimizations in future. > // > // The pass can run before or after register allocation. When running > // before register allocation, it tries to avoid cases that are likely > @@ -111,12 +118,18 @@ public: > unsigned int execute (function *); > > private: > - rtx optimizable_set (insn_info *); > + rtx optimizable_set (set_info *); > bool check_register_pressure (insn_info *, rtx); > bool check_uses (set_info *, rtx); > - bool combine_into_uses (insn_info *, insn_info *); > + bool combine_into_uses (set_info *, rtx, insn_info *); > + bool parallelize_insns (set_info *, rtx, set_info *, rtx); > + bool combine_cc_setter (set_info *, rtx); > + bool combine_insn (insn_info *, insn_info *); > > auto_vec<insn_info *> m_worklist; > + > + // A spare PARALLEL rtx, or null if none. > + rtx m_parallel = NULL_RTX; > }; > > insn_combination::insn_combination (set_info *def, rtx dest, rtx src) > @@ -454,11 +467,26 @@ insn_combination::run () > return true; > } > > -// See whether INSN is a single_set that we can optimize. Return the > -// set if so, otherwise return null. > +// DEF is the result of calling single_set_info on its instruction. > +// See whether that instruction is a single_set that we can optimize. > +// Return the set if so, otherwise return null. > rtx > -late_combine::optimizable_set (insn_info *insn) > +late_combine::optimizable_set (set_info *def) > { > + // For simplicity, don't try to handle sets of multiple hard registers. > + // And for correctness, don't remove any assignments to the stack or > + // frame pointers, since that would implicitly change the set of valid > + // memory locations between this assignment and the next. > + // > + // Removing assignments to the hard frame pointer would invalidate > + // backtraces. > + if (!def->is_reg () > + || def->regno () == STACK_POINTER_REGNUM > + || def->regno () == FRAME_POINTER_REGNUM > + || def->regno () == HARD_FRAME_POINTER_REGNUM) > + return NULL_RTX; > + > + auto *insn = def->insn (); > if (!insn->can_be_optimized () > || insn->is_asm () > || insn->is_call () > @@ -467,7 +495,16 @@ late_combine::optimizable_set (insn_info *insn) > || !can_move_insn_p (insn)) > return NULL_RTX; > > - return single_set (insn->rtl ()); > + rtx set = single_set (insn->rtl ()); > + if (!set) > + return NULL_RTX; > + > + // For simplicity, don't try to handle subreg destinations. > + rtx dest = SET_DEST (set); > + if (!REG_P (dest) || REG_NREGS (dest) != 1 || def->regno () != REGNO > (dest)) > + return NULL_RTX; > + > + return set; > } > > // Suppose that we can replace all uses of SET_DEST (SET) with SET_SRC (SET), > @@ -643,35 +680,13 @@ late_combine::check_uses (set_info *def, rtx set) > return true; > } > > -// Try to remove INSN by substituting a definition into all uses. > -// If the optimization moves any instructions before CURSOR, add those > -// instructions to the end of m_worklist. > +// Try to remove DEF's instruction by substituting DEF into all uses. > +// SET is the rtx set associated with DEF. If the optimization moves any > +// instructions before CURSOR, add those instructions to the end of > m_worklist. > bool > -late_combine::combine_into_uses (insn_info *insn, insn_info *cursor) > +late_combine::combine_into_uses (set_info *def, rtx set, insn_info *cursor) > { > - // For simplicity, don't try to handle sets of multiple hard registers. > - // And for correctness, don't remove any assignments to the stack or > - // frame pointers, since that would implicitly change the set of valid > - // memory locations between this assignment and the next. > - // > - // Removing assignments to the hard frame pointer would invalidate > - // backtraces. > - set_info *def = single_set_info (insn); > - if (!def > - || !def->is_reg () > - || def->regno () == STACK_POINTER_REGNUM > - || def->regno () == FRAME_POINTER_REGNUM > - || def->regno () == HARD_FRAME_POINTER_REGNUM) > - return false; > - > - rtx set = optimizable_set (insn); > - if (!set) > - return false; > - > - // For simplicity, don't try to handle subreg destinations. > - rtx dest = SET_DEST (set); > - if (!REG_P (dest) || def->regno () != REGNO (dest)) > - return false; > + auto *insn = def->insn (); > > // Don't prolong the live ranges of allocatable hard registers, or put > // them into more complicated instructions. Failing to prevent this > @@ -698,6 +713,158 @@ late_combine::combine_into_uses (insn_info *insn, > insn_info *cursor) > return true; > } > > +// CC_SET is a single_set that sets (only) CC_DEF; OTHER_SET is likewise > +// a single_set that sets (only) OTHER_DEF. CC_SET is known to set a > +// condition-code register and the instruction that contains CC_SET is > +// known to use OTHER_DEF. Try to do CC_SET and OTHER_SET in parallel. > +bool > +late_combine::parallelize_insns (set_info *cc_def, rtx cc_set, > + set_info *other_def, rtx other_set) > +{ > + auto attempt = crtl->ssa->new_change_attempt (); > + > + insn_info *cc_insn = cc_def->insn (); > + insn_info *other_insn = other_def->insn (); > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "trying to parallelize insn %d and insn %d\n", > + other_insn->uid (), cc_insn->uid ()); > + > + // Try to substitute OTHER_SET into CC_INSN. > + insn_change_watermark rtl_watermark; > + rtx_insn *cc_rtl = cc_insn->rtl (); > + insn_propagation prop (cc_rtl, SET_DEST (other_set), SET_SRC (other_set)); > + if (!prop.apply_to_pattern (&PATTERN (cc_rtl)) > + || prop.num_replacements == 0) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "-- failed to substitute all uses of r%d\n", > + other_def->regno ()); > + return false; > + } > + > + // Restrict the uses to those outside notes. > + use_array cc_uses = remove_note_accesses (attempt, cc_insn->uses ()); > + use_array other_set_uses = remove_note_accesses (attempt, > + other_insn->uses ()); > + > + // Remove the use of the substituted value. > + cc_uses = remove_uses_of_def (attempt, cc_uses, other_def); > + > + // Get the list of uses for the new instruction. > + insn_change cc_change (cc_insn); > + cc_change.new_uses = merge_access_arrays (attempt, other_set_uses, > cc_uses); > + if (!cc_change.new_uses.is_valid ()) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "-- cannot merge uses\n"); > + return false; > + } > + > + // The instruction initially defines just two registers. recog can add > + // extra clobbers if necessary. > + auto_vec<access_info *, 2> new_defs; > + new_defs.quick_push (cc_def); > + new_defs.quick_push (other_def); > + sort_accesses (new_defs); > + cc_change.new_defs = def_array (access_array (new_defs)); > + > + // Make sure there is somewhere that the new instruction could live. > + auto other_change = insn_change::delete_insn (other_insn); > + insn_change *changes[] = { &other_change, &cc_change }; > + cc_change.move_range = cc_insn->ebb ()->insn_range (); > + if (!restrict_movement (cc_change, ignore_changing_insns (changes))) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "-- cannot satisfy all definitions and uses\n"); > + return false; > + } > + > + // Tentatively install the new pattern. By convention, the CC set > + // must be first. > + if (m_parallel) > + { > + XVECEXP (m_parallel, 0, 0) = cc_set; > + XVECEXP (m_parallel, 0, 1) = other_set; > + } > + else > + { > + rtvec vec = gen_rtvec (2, cc_set, other_set); > + m_parallel = gen_rtx_PARALLEL (VOIDmode, vec); > + } > + validate_change (cc_rtl, &PATTERN (cc_rtl), m_parallel, 1); > + > + // These routines report failures themselves. > + if (!recog (attempt, cc_change, ignore_changing_insns (changes)) > + || !changes_are_worthwhile (changes) > + || !crtl->ssa->verify_insn_changes (changes)) > + return false; > + > + remove_reg_equal_equiv_notes (cc_rtl); > + confirm_change_group (); > + crtl->ssa->change_insns (changes); > + m_parallel = NULL_RTX; > + return true; > +} > + > +// CC_SET is a single_set that sets (only) CC_DEF. See whether CC_DEF > +// is a definition of a condition-code register and try to optimize it > +// with related instructions if so. Return true if something changed. > +// > +// This function looks for sequences of the form: > +// > +// A: (set (reg R1) X1) > +// B: ...instructions that might change the value of X1... > +// C: (set (reg CC) X2) // X2 uses R1 > +// > +// and tries to change them to: > +// > +// C': [(set (reg CC) X2') > +// (set (reg R1) X1)] > +// B: ...instructions that might change the value of X1... > +// > +// where X2' is the result of replacing R1 with X1 in X2. > +// > +// Combine can handle this optimization if B doesn't exist and if A and > +// C are in the same BB. This pass instead handles cases where B does > +// exist and cases where A and C are in different BBs of the same EBB. > +bool > +late_combine::combine_cc_setter (set_info *cc_def, rtx cc_set) > +{ > + // Check for a set of a CC register. This isn't needed for correctness; > + // it's just a way of narrowing the search space. It could be relaxed if > + // there are other situations that would benefit from the same > optimization. > + if (!HARD_REGISTER_NUM_P (cc_def->regno ()) > + || GET_MODE_CLASS (cc_def->mode()) != MODE_CC) > + return false; > + > + // Search the registers used by the CC setter for an easily-substitutable > + // def-use chain. > + for (use_info *other_use : cc_def->insn ()->uses ()) > + if (auto *other_def = other_use->def ()) > + if (other_use->regno () != cc_def->regno () > + && other_def->ebb () == cc_def->ebb () > + && other_def == single_set_info (other_def->insn ())) > + if (rtx other_set = optimizable_set (other_def)) > + if (parallelize_insns (cc_def, cc_set, other_def, other_set)) > + return true; > + > + return false; > +} > + > +// Try to optimize INSN in some way. If the optimization moves any > +// instructions before CURSOR, and if further optimizations might be > +// possible on those instructions, add them to the end of m_worklist. > +bool > +late_combine::combine_insn (insn_info *insn, insn_info *cursor) > +{ > + if (set_info *def = single_set_info (insn)) > + if (rtx set = optimizable_set (def)) > + return (combine_into_uses (def, set, cursor) > + || combine_cc_setter (def, set)); > + > + return false; > +} > + > // Run the pass on function FN. > unsigned int > late_combine::execute (function *fn) > @@ -715,7 +882,7 @@ late_combine::execute (function *fn) > if (!insn->is_artificial ()) > { > insn_info *prev = insn->prev_nondebug_insn (); > - if (combine_into_uses (insn, prev)) > + if (combine_insn (insn, prev)) > { > // Any instructions that get added to the worklist were > // previously after PREV. Thus if we were able to move > @@ -725,7 +892,7 @@ late_combine::execute (function *fn) > // the worklist should be free of backwards dependencies, > // even if it isn't necessarily in RPO. > for (unsigned int i = 0; i < m_worklist.length (); ++i) > - combine_into_uses (m_worklist[i], prev); > + combine_insn (m_worklist[i], prev); > m_worklist.truncate (0); > insn = prev; > } > diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc > index 19b66456e45..63a1d08c46c 100644 > --- a/gcc/rtlanal.cc > +++ b/gcc/rtlanal.cc > @@ -5740,7 +5740,8 @@ pattern_cost (rtx pat, bool speed) > rtx x = XVECEXP (pat, 0, i); > if (GET_CODE (x) == SET) > { > - if (GET_CODE (SET_SRC (x)) == COMPARE) > + if (GET_CODE (SET_SRC (x)) == COMPARE > + || GET_MODE_CLASS (GET_MODE (SET_DEST (x))) == MODE_CC) > { > if (comparison) > return 0; > -- > 2.43.0 >