Re: [PATCH] Merge aarch64-cc-fusion into late-combine

Richard Biener Wed, 20 Aug 2025 04:33:24 -0700

On Wed, Aug 20, 2025 at 1:05 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> I'd added the aarch64-specific CC fusion pass to fold a PTEST
> instruction into the instruction that feeds the PTEST, in cases
> where the latter instruction can set the appropriate flags as a
> side-effect.
>
> Combine does the same optimisation.  However, as explained in the
> comments, the PTEST case often has:
>
>   A: set predicate P based on inputs X
>   B: clobber X
>   C: test P
>
> and so the fusion is only possible if we move C before B.
> That's something that combine currently can't do (for the cases
> that we needed).
>
> The optimisation was never really AArch64-specific.  It's just that,
> in an all-too-familiar fashion, we needed it in stage 3, when it was
> too late to add something target-independent.
>
> late-combine adds a convenient place to do the optimisation in a
> target-independent way, just as combine is a convenient place to
> do its related optimisation.
>
> Tested on aarch64-linux-gnu, powerpc64le-linux-gnu and
> x86_64-linux-gnu.  OK to install?


OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         * config.gcc (aarch64*-*-*): Remove aarch64-cc-fusion.o from
>         extra_objs.
>         * config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete.
>         * config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete.
>         * config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete.
>         * config/aarch64/aarch64-cc-fusion.cc: Delete.
>         * late-combine.cc (late_combine::optimizable_set): Take a set_info *
>         rather than an insn_info * and move destination tests from...
>         (late_combine::combine_into_uses): ...here. Take a set_info * rather
>         an insn_info *.  Take the rtx set.
>         (late_combine::parallelize_insns, late_combine::combine_cc_setter)
>         (late_combine::combine_insn): New member functions.
>         (late_combine::m_parallel): New member variable.
>         * rtlanal.cc (pattern_cost): Handle sets of CC registers in the
>         same way as comparisons.
> ---
>  gcc/config.gcc                          |   2 +-
>  gcc/config/aarch64/aarch64-cc-fusion.cc | 297 ------------------------
>  gcc/config/aarch64/aarch64-passes.def   |   1 -
>  gcc/config/aarch64/aarch64-protos.h     |   1 -
>  gcc/config/aarch64/t-aarch64            |   6 -
>  gcc/late-combine.cc                     | 243 ++++++++++++++++---
>  gcc/rtlanal.cc                          |   3 +-
>  7 files changed, 208 insertions(+), 345 deletions(-)
>  delete mode 100644 gcc/config/aarch64/aarch64-cc-fusion.cc
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 56246387ef5..517df40e5de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -351,7 +351,7 @@ aarch64*-*-*)
>         c_target_objs="aarch64-c.o"
>         cxx_target_objs="aarch64-c.o"
>         d_target_objs="aarch64-d.o"
> -       extra_objs="aarch64-builtins.o aarch-common.o aarch64-elf-metadata.o 
> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o 
> aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o 
> aarch-bti-insert.o aarch64-cc-fusion.o aarch64-early-ra.o 
> aarch64-ldp-fusion.o"
> +       extra_objs="aarch64-builtins.o aarch-common.o aarch64-elf-metadata.o 
> aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o 
> aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o 
> aarch-bti-insert.o aarch64-early-ra.o aarch64-ldp-fusion.o"
>         target_gtfiles="\$(srcdir)/config/aarch64/aarch64-protos.h 
> \$(srcdir)/config/aarch64/aarch64-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-builtins.cc 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.h 
> \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
>         target_has_targetm_common=yes
>         ;;
> diff --git a/gcc/config/aarch64/aarch64-cc-fusion.cc 
> b/gcc/config/aarch64/aarch64-cc-fusion.cc
> deleted file mode 100644
> index cea54dee298..00000000000
> --- a/gcc/config/aarch64/aarch64-cc-fusion.cc
> +++ /dev/null
> @@ -1,297 +0,0 @@
> -// Pass to fuse CC operations with other instructions.
> -// Copyright (C) 2021-2025 Free Software Foundation, Inc.
> -//
> -// This file is part of GCC.
> -//
> -// GCC is free software; you can redistribute it and/or modify it under
> -// the terms of the GNU General Public License as published by the Free
> -// Software Foundation; either version 3, or (at your option) any later
> -// version.
> -//
> -// GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> -// WARRANTY; without even the implied warranty of MERCHANTABILITY or
> -// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> -// for more details.
> -//
> -// You should have received a copy of the GNU General Public License
> -// along with GCC; see the file COPYING3.  If not see
> -// <http://www.gnu.org/licenses/>.
> -
> -// This pass looks for sequences of the form:
> -//
> -//    A: (set (reg R1) X1)
> -//    B: ...instructions that might change the value of X1...
> -//    C: (set (reg CC) X2) // X2 uses R1
> -//
> -// and tries to change them to:
> -//
> -//    C': [(set (reg CC) X2')
> -//         (set (reg R1) X1)]
> -//    B: ...instructions that might change the value of X1...
> -//
> -// where X2' is the result of replacing R1 with X1 in X2.
> -//
> -// This sequence occurs in SVE code in two important cases:
> -//
> -// (a) Sometimes, to deal correctly with overflow, we need to increment
> -//     an IV after a WHILELO rather than before it.  In this case:
> -//     - A is a WHILELO,
> -//     - B includes an IV increment and
> -//     - C is a separate PTEST.
> -//
> -// (b) ACLE code of the form:
> -//
> -//       svbool_t ok = svrdffr ();
> -//       if (svptest_last (pg, ok))
> -//         ...
> -//
> -//     must, for performance reasons, be code-generated as:
> -//
> -//       RDFFRS Pok.B, Pg/Z
> -//       ...branch on flags result...
> -//
> -//     without a separate PTEST of Pok.  In this case:
> -//     - A is an aarch64_rdffr
> -//     - B includes an aarch64_update_ffrt
> -//     - C is a separate PTEST
> -//
> -// Combine can handle this optimization if B doesn't exist and if A and
> -// C are in the same BB.  This pass instead handles cases where B does
> -// exist and cases where A and C are in different BBs of the same EBB.
> -
> -#define IN_TARGET_CODE 1
> -
> -#define INCLUDE_ALGORITHM
> -#define INCLUDE_FUNCTIONAL
> -#define INCLUDE_ARRAY
> -#include "config.h"
> -#include "system.h"
> -#include "coretypes.h"
> -#include "backend.h"
> -#include "rtl.h"
> -#include "df.h"
> -#include "rtl-ssa.h"
> -#include "tree-pass.h"
> -
> -using namespace rtl_ssa;
> -
> -namespace {
> -const pass_data pass_data_cc_fusion =
> -{
> -  RTL_PASS, // type
> -  "cc_fusion", // name
> -  OPTGROUP_NONE, // optinfo_flags
> -  TV_NONE, // tv_id
> -  0, // properties_required
> -  0, // properties_provided
> -  0, // properties_destroyed
> -  0, // todo_flags_start
> -  TODO_df_finish, // todo_flags_finish
> -};
> -
> -// Class that represents one run of the pass.
> -class cc_fusion
> -{
> -public:
> -  cc_fusion ()  : m_parallel () {}
> -  void execute ();
> -
> -private:
> -  rtx optimizable_set (const insn_info *);
> -  bool parallelize_insns (def_info *, rtx, def_info *, rtx);
> -  void optimize_cc_setter (def_info *, rtx);
> -
> -  // A spare PARALLEL rtx, or null if none.
> -  rtx m_parallel;
> -};
> -
> -// See whether INSN is a single_set that we can optimize.  Return the
> -// set if so, otherwise return null.
> -rtx
> -cc_fusion::optimizable_set (const insn_info *insn)
> -{
> -  if (!insn->can_be_optimized ()
> -      || insn->is_asm ()
> -      || insn->has_volatile_refs ()
> -      || insn->has_pre_post_modify ())
> -    return NULL_RTX;
> -
> -  return single_set (insn->rtl ());
> -}
> -
> -// CC_SET is a single_set that sets (only) CC_DEF; OTHER_SET is likewise
> -// a single_set that sets (only) OTHER_DEF.  CC_SET is known to set the
> -// CC register and the instruction that contains CC_SET is known to use
> -// OTHER_DEF.  Try to do CC_SET and OTHER_SET in parallel.
> -bool
> -cc_fusion::parallelize_insns (def_info *cc_def, rtx cc_set,
> -                             def_info *other_def, rtx other_set)
> -{
> -  auto attempt = crtl->ssa->new_change_attempt ();
> -
> -  insn_info *cc_insn = cc_def->insn ();
> -  insn_info *other_insn = other_def->insn ();
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -    fprintf (dump_file, "trying to parallelize insn %d and insn %d\n",
> -            other_insn->uid (), cc_insn->uid ());
> -
> -  // Try to substitute OTHER_SET into CC_INSN.
> -  insn_change_watermark rtl_watermark;
> -  rtx_insn *cc_rtl = cc_insn->rtl ();
> -  insn_propagation prop (cc_rtl, SET_DEST (other_set),
> -                        SET_SRC (other_set));
> -  if (!prop.apply_to_pattern (&PATTERN (cc_rtl))
> -      || prop.num_replacements == 0)
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, "-- failed to substitute all uses of r%d\n",
> -                other_def->regno ());
> -      return false;
> -    }
> -
> -  // Restrict the uses to those outside notes.
> -  use_array cc_uses = remove_note_accesses (attempt, cc_insn->uses ());
> -  use_array other_set_uses = remove_note_accesses (attempt,
> -                                                  other_insn->uses ());
> -
> -  // Remove the use of the substituted value.
> -  access_array_builder uses_builder (attempt);
> -  uses_builder.reserve (cc_uses.size ());
> -  for (use_info *use : cc_uses)
> -    if (use->def () != other_def)
> -      uses_builder.quick_push (use);
> -  cc_uses = use_array (uses_builder.finish ());
> -
> -  // Get the list of uses for the new instruction.
> -  insn_change cc_change (cc_insn);
> -  cc_change.new_uses = merge_access_arrays (attempt, other_set_uses, 
> cc_uses);
> -  if (!cc_change.new_uses.is_valid ())
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, "-- cannot merge uses\n");
> -      return false;
> -    }
> -
> -  // The instruction initially defines just two registers.  recog can add
> -  // extra clobbers if necessary.
> -  auto_vec<access_info *, 2> new_defs;
> -  new_defs.quick_push (cc_def);
> -  new_defs.quick_push (other_def);
> -  sort_accesses (new_defs);
> -  cc_change.new_defs = def_array (access_array (new_defs));
> -
> -  // Make sure there is somewhere that the new instruction could live.
> -  auto other_change = insn_change::delete_insn (other_insn);
> -  insn_change *changes[] = { &other_change, &cc_change };
> -  cc_change.move_range = cc_insn->ebb ()->insn_range ();
> -  if (!restrict_movement (cc_change, ignore_changing_insns (changes)))
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, "-- cannot satisfy all definitions and uses\n");
> -      return false;
> -    }
> -
> -  // Tentatively install the new pattern.  By convention, the CC set
> -  // must be first.
> -  if (m_parallel)
> -    {
> -      XVECEXP (m_parallel, 0, 0) = cc_set;
> -      XVECEXP (m_parallel, 0, 1) = other_set;
> -    }
> -  else
> -    {
> -      rtvec vec = gen_rtvec (2, cc_set, other_set);
> -      m_parallel = gen_rtx_PARALLEL (VOIDmode, vec);
> -    }
> -  validate_change (cc_rtl, &PATTERN (cc_rtl), m_parallel, 1);
> -
> -  // These routines report failures themselves.
> -  if (!recog (attempt, cc_change, ignore_changing_insns (changes))
> -      || !changes_are_worthwhile (changes)
> -      || !crtl->ssa->verify_insn_changes (changes))
> -    return false;
> -
> -  remove_reg_equal_equiv_notes (cc_rtl);
> -  confirm_change_group ();
> -  crtl->ssa->change_insns (changes);
> -  m_parallel = NULL_RTX;
> -  return true;
> -}
> -
> -// Try to optimize the instruction that contains CC_DEF, where CC_DEF 
> describes
> -// a definition of the CC register by CC_SET.
> -void
> -cc_fusion::optimize_cc_setter (def_info *cc_def, rtx cc_set)
> -{
> -  // Search the registers used by the CC setter for an easily-substitutable
> -  // def-use chain.
> -  for (use_info *other_use : cc_def->insn ()->uses ())
> -    if (def_info *other_def = other_use->def ())
> -      if (other_use->regno () != CC_REGNUM
> -         && other_def->ebb () == cc_def->ebb ())
> -       if (rtx other_set = optimizable_set (other_def->insn ()))
> -         {
> -           rtx dest = SET_DEST (other_set);
> -           if (REG_P (dest)
> -               && REGNO (dest) == other_def->regno ()
> -               && REG_NREGS (dest) == 1
> -               && parallelize_insns (cc_def, cc_set, other_def, other_set))
> -             return;
> -         }
> -}
> -
> -// Run the pass on the current function.
> -void
> -cc_fusion::execute ()
> -{
> -  // Initialization.
> -  calculate_dominance_info (CDI_DOMINATORS);
> -  df_analyze ();
> -  crtl->ssa = new rtl_ssa::function_info (cfun);
> -
> -  // Walk through all instructions that set CC.  Look for a PTEST instruction
> -  // that we can optimize.
> -  //
> -  // ??? The PTEST test isn't needed for correctness, but it ensures that the
> -  // pass no effect on non-SVE code.
> -  for (def_info *def : crtl->ssa->reg_defs (CC_REGNUM))
> -    if (rtx cc_set = optimizable_set (def->insn ()))
> -      if (REG_P (SET_DEST (cc_set))
> -         && REGNO (SET_DEST (cc_set)) == CC_REGNUM
> -         && GET_CODE (SET_SRC (cc_set)) == UNSPEC
> -         && XINT (SET_SRC (cc_set), 1) == UNSPEC_PTEST)
> -       optimize_cc_setter (def, cc_set);
> -
> -  // Finalization.
> -  crtl->ssa->perform_pending_updates ();
> -  free_dominance_info (CDI_DOMINATORS);
> -}
> -
> -class pass_cc_fusion : public rtl_opt_pass
> -{
> -public:
> -  pass_cc_fusion (gcc::context *ctxt)
> -    : rtl_opt_pass (pass_data_cc_fusion, ctxt)
> -  {}
> -
> -  // opt_pass methods:
> -  virtual bool gate (function *) { return TARGET_SVE && optimize >= 2; }
> -  virtual unsigned int execute (function *);
> -};
> -
> -unsigned int
> -pass_cc_fusion::execute (function *)
> -{
> -  cc_fusion ().execute ();
> -  return 0;
> -}
> -
> -} // end namespace
> -
> -// Create a new CC fusion pass instance.
> -
> -rtl_opt_pass *
> -make_pass_cc_fusion (gcc::context *ctxt)
> -{
> -  return new pass_cc_fusion (ctxt);
> -}
> diff --git a/gcc/config/aarch64/aarch64-passes.def 
> b/gcc/config/aarch64/aarch64-passes.def
> index 9cf9d3e13b2..6a53ff35591 100644
> --- a/gcc/config/aarch64/aarch64-passes.def
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -24,6 +24,5 @@ INSERT_PASS_BEFORE (pass_reorder_blocks, 1, 
> pass_track_speculation);
>  INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
> pass_switch_pstate_sm);
>  INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, 
> pass_late_track_speculation);
>  INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti);
> -INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion);
>  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_ldp_fusion);
>  INSERT_PASS_BEFORE (pass_peephole2, 1, pass_ldp_fusion);
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index d26e1d5642e..56efcf2c7f2 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1237,7 +1237,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *);
>  rtl_opt_pass *make_pass_track_speculation (gcc::context *);
>  rtl_opt_pass *make_pass_late_track_speculation (gcc::context *);
>  rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt);
> -rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt);
>  rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt);
>  rtl_opt_pass *make_pass_ldp_fusion (gcc::context *);
>
> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
> index 38a8c063725..63ca8e90c9d 100644
> --- a/gcc/config/aarch64/t-aarch64
> +++ b/gcc/config/aarch64/t-aarch64
> @@ -190,12 +190,6 @@ aarch-bti-insert.o: 
> $(srcdir)/config/arm/aarch-bti-insert.cc \
>         $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>                 $(srcdir)/config/arm/aarch-bti-insert.cc
>
> -aarch64-cc-fusion.o: $(srcdir)/config/aarch64/aarch64-cc-fusion.cc \
> -    $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
> -    $(RTL_SSA_H) tree-pass.h
> -       $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
> -               $(srcdir)/config/aarch64/aarch64-cc-fusion.cc
> -
>  aarch64-early-ra.o: $(srcdir)/config/aarch64/aarch64-early-ra.cc \
>      $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
>      $(RTL_SSA_H) tree-pass.h
> diff --git a/gcc/late-combine.cc b/gcc/late-combine.cc
> index 90d7ef09583..770780eb04d 100644
> --- a/gcc/late-combine.cc
> +++ b/gcc/late-combine.cc
> @@ -17,9 +17,16 @@
>  // along with GCC; see the file COPYING3.  If not see
>  // <http://www.gnu.org/licenses/>.
>
> -// The current purpose of this pass is to substitute definitions into
> -// all uses, so that the definition can be removed.  However, it could
> -// be extended to handle other combination-related optimizations in future.
> +// This pass currently has two purposes:
> +//
> +// - to substitute definitions into all uses, so that the definition
> +//   can be removed.
> +//
> +// - to try to parallelise sets of condition-code registers with a
> +//   related instruction (see combine_cc_setter for details).
> +//
> +// However, it could be extended to handle other combination-related
> +// optimizations in future.
>  //
>  // The pass can run before or after register allocation.  When running
>  // before register allocation, it tries to avoid cases that are likely
> @@ -111,12 +118,18 @@ public:
>    unsigned int execute (function *);
>
>  private:
> -  rtx optimizable_set (insn_info *);
> +  rtx optimizable_set (set_info *);
>    bool check_register_pressure (insn_info *, rtx);
>    bool check_uses (set_info *, rtx);
> -  bool combine_into_uses (insn_info *, insn_info *);
> +  bool combine_into_uses (set_info *, rtx, insn_info *);
> +  bool parallelize_insns (set_info *, rtx, set_info *, rtx);
> +  bool combine_cc_setter (set_info *, rtx);
> +  bool combine_insn (insn_info *, insn_info *);
>
>    auto_vec<insn_info *> m_worklist;
> +
> +  // A spare PARALLEL rtx, or null if none.
> +  rtx m_parallel = NULL_RTX;
>  };
>
>  insn_combination::insn_combination (set_info *def, rtx dest, rtx src)
> @@ -454,11 +467,26 @@ insn_combination::run ()
>    return true;
>  }
>
> -// See whether INSN is a single_set that we can optimize.  Return the
> -// set if so, otherwise return null.
> +// DEF is the result of calling single_set_info on its instruction.
> +// See whether that instruction is a single_set that we can optimize.
> +// Return the set if so, otherwise return null.
>  rtx
> -late_combine::optimizable_set (insn_info *insn)
> +late_combine::optimizable_set (set_info *def)
>  {
> +  // For simplicity, don't try to handle sets of multiple hard registers.
> +  // And for correctness, don't remove any assignments to the stack or
> +  // frame pointers, since that would implicitly change the set of valid
> +  // memory locations between this assignment and the next.
> +  //
> +  // Removing assignments to the hard frame pointer would invalidate
> +  // backtraces.
> +  if (!def->is_reg ()
> +      || def->regno () == STACK_POINTER_REGNUM
> +      || def->regno () == FRAME_POINTER_REGNUM
> +      || def->regno () == HARD_FRAME_POINTER_REGNUM)
> +    return NULL_RTX;
> +
> +  auto *insn = def->insn ();
>    if (!insn->can_be_optimized ()
>        || insn->is_asm ()
>        || insn->is_call ()
> @@ -467,7 +495,16 @@ late_combine::optimizable_set (insn_info *insn)
>        || !can_move_insn_p (insn))
>      return NULL_RTX;
>
> -  return single_set (insn->rtl ());
> +  rtx set = single_set (insn->rtl ());
> +  if (!set)
> +    return NULL_RTX;
> +
> +  // For simplicity, don't try to handle subreg destinations.
> +  rtx dest = SET_DEST (set);
> +  if (!REG_P (dest) || REG_NREGS (dest) != 1 || def->regno () != REGNO 
> (dest))
> +    return NULL_RTX;
> +
> +  return set;
>  }
>
>  // Suppose that we can replace all uses of SET_DEST (SET) with SET_SRC (SET),
> @@ -643,35 +680,13 @@ late_combine::check_uses (set_info *def, rtx set)
>    return true;
>  }
>
> -// Try to remove INSN by substituting a definition into all uses.
> -// If the optimization moves any instructions before CURSOR, add those
> -// instructions to the end of m_worklist.
> +// Try to remove DEF's instruction by substituting DEF into all uses.
> +// SET is the rtx set associated with DEF.  If the optimization moves any
> +// instructions before CURSOR, add those instructions to the end of 
> m_worklist.
>  bool
> -late_combine::combine_into_uses (insn_info *insn, insn_info *cursor)
> +late_combine::combine_into_uses (set_info *def, rtx set, insn_info *cursor)
>  {
> -  // For simplicity, don't try to handle sets of multiple hard registers.
> -  // And for correctness, don't remove any assignments to the stack or
> -  // frame pointers, since that would implicitly change the set of valid
> -  // memory locations between this assignment and the next.
> -  //
> -  // Removing assignments to the hard frame pointer would invalidate
> -  // backtraces.
> -  set_info *def = single_set_info (insn);
> -  if (!def
> -      || !def->is_reg ()
> -      || def->regno () == STACK_POINTER_REGNUM
> -      || def->regno () == FRAME_POINTER_REGNUM
> -      || def->regno () == HARD_FRAME_POINTER_REGNUM)
> -    return false;
> -
> -  rtx set = optimizable_set (insn);
> -  if (!set)
> -    return false;
> -
> -  // For simplicity, don't try to handle subreg destinations.
> -  rtx dest = SET_DEST (set);
> -  if (!REG_P (dest) || def->regno () != REGNO (dest))
> -    return false;
> +  auto *insn = def->insn ();
>
>    // Don't prolong the live ranges of allocatable hard registers, or put
>    // them into more complicated instructions.  Failing to prevent this
> @@ -698,6 +713,158 @@ late_combine::combine_into_uses (insn_info *insn, 
> insn_info *cursor)
>    return true;
>  }
>
> +// CC_SET is a single_set that sets (only) CC_DEF; OTHER_SET is likewise
> +// a single_set that sets (only) OTHER_DEF.  CC_SET is known to set a
> +// condition-code register and the instruction that contains CC_SET is
> +// known to use OTHER_DEF.  Try to do CC_SET and OTHER_SET in parallel.
> +bool
> +late_combine::parallelize_insns (set_info *cc_def, rtx cc_set,
> +                                set_info *other_def, rtx other_set)
> +{
> +  auto attempt = crtl->ssa->new_change_attempt ();
> +
> +  insn_info *cc_insn = cc_def->insn ();
> +  insn_info *other_insn = other_def->insn ();
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "trying to parallelize insn %d and insn %d\n",
> +            other_insn->uid (), cc_insn->uid ());
> +
> +  // Try to substitute OTHER_SET into CC_INSN.
> +  insn_change_watermark rtl_watermark;
> +  rtx_insn *cc_rtl = cc_insn->rtl ();
> +  insn_propagation prop (cc_rtl, SET_DEST (other_set), SET_SRC (other_set));
> +  if (!prop.apply_to_pattern (&PATTERN (cc_rtl))
> +      || prop.num_replacements == 0)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "-- failed to substitute all uses of r%d\n",
> +                other_def->regno ());
> +      return false;
> +    }
> +
> +  // Restrict the uses to those outside notes.
> +  use_array cc_uses = remove_note_accesses (attempt, cc_insn->uses ());
> +  use_array other_set_uses = remove_note_accesses (attempt,
> +                                                  other_insn->uses ());
> +
> +  // Remove the use of the substituted value.
> +  cc_uses = remove_uses_of_def (attempt, cc_uses, other_def);
> +
> +  // Get the list of uses for the new instruction.
> +  insn_change cc_change (cc_insn);
> +  cc_change.new_uses = merge_access_arrays (attempt, other_set_uses, 
> cc_uses);
> +  if (!cc_change.new_uses.is_valid ())
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "-- cannot merge uses\n");
> +      return false;
> +    }
> +
> +  // The instruction initially defines just two registers.  recog can add
> +  // extra clobbers if necessary.
> +  auto_vec<access_info *, 2> new_defs;
> +  new_defs.quick_push (cc_def);
> +  new_defs.quick_push (other_def);
> +  sort_accesses (new_defs);
> +  cc_change.new_defs = def_array (access_array (new_defs));
> +
> +  // Make sure there is somewhere that the new instruction could live.
> +  auto other_change = insn_change::delete_insn (other_insn);
> +  insn_change *changes[] = { &other_change, &cc_change };
> +  cc_change.move_range = cc_insn->ebb ()->insn_range ();
> +  if (!restrict_movement (cc_change, ignore_changing_insns (changes)))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "-- cannot satisfy all definitions and uses\n");
> +      return false;
> +    }
> +
> +  // Tentatively install the new pattern.  By convention, the CC set
> +  // must be first.
> +  if (m_parallel)
> +    {
> +      XVECEXP (m_parallel, 0, 0) = cc_set;
> +      XVECEXP (m_parallel, 0, 1) = other_set;
> +    }
> +  else
> +    {
> +      rtvec vec = gen_rtvec (2, cc_set, other_set);
> +      m_parallel = gen_rtx_PARALLEL (VOIDmode, vec);
> +    }
> +  validate_change (cc_rtl, &PATTERN (cc_rtl), m_parallel, 1);
> +
> +  // These routines report failures themselves.
> +  if (!recog (attempt, cc_change, ignore_changing_insns (changes))
> +      || !changes_are_worthwhile (changes)
> +      || !crtl->ssa->verify_insn_changes (changes))
> +    return false;
> +
> +  remove_reg_equal_equiv_notes (cc_rtl);
> +  confirm_change_group ();
> +  crtl->ssa->change_insns (changes);
> +  m_parallel = NULL_RTX;
> +  return true;
> +}
> +
> +// CC_SET is a single_set that sets (only) CC_DEF.  See whether CC_DEF
> +// is a definition of a condition-code register and try to optimize it
> +// with related instructions if so.  Return true if something changed.
> +//
> +// This function looks for sequences of the form:
> +//
> +//    A: (set (reg R1) X1)
> +//    B: ...instructions that might change the value of X1...
> +//    C: (set (reg CC) X2) // X2 uses R1
> +//
> +// and tries to change them to:
> +//
> +//    C': [(set (reg CC) X2')
> +//         (set (reg R1) X1)]
> +//    B: ...instructions that might change the value of X1...
> +//
> +// where X2' is the result of replacing R1 with X1 in X2.
> +//
> +// Combine can handle this optimization if B doesn't exist and if A and
> +// C are in the same BB.  This pass instead handles cases where B does
> +// exist and cases where A and C are in different BBs of the same EBB.
> +bool
> +late_combine::combine_cc_setter (set_info *cc_def, rtx cc_set)
> +{
> +  // Check for a set of a CC register.  This isn't needed for correctness;
> +  // it's just a way of narrowing the search space.  It could be relaxed if
> +  // there are other situations that would benefit from the same 
> optimization.
> +  if (!HARD_REGISTER_NUM_P (cc_def->regno ())
> +      || GET_MODE_CLASS (cc_def->mode()) != MODE_CC)
> +    return false;
> +
> +  // Search the registers used by the CC setter for an easily-substitutable
> +  // def-use chain.
> +  for (use_info *other_use : cc_def->insn ()->uses ())
> +    if (auto *other_def = other_use->def ())
> +      if (other_use->regno () != cc_def->regno ()
> +         && other_def->ebb () == cc_def->ebb ()
> +         && other_def == single_set_info (other_def->insn ()))
> +       if (rtx other_set = optimizable_set (other_def))
> +         if (parallelize_insns (cc_def, cc_set, other_def, other_set))
> +           return true;
> +
> +  return false;
> +}
> +
> +// Try to optimize INSN in some way.  If the optimization moves any
> +// instructions before CURSOR, and if further optimizations might be
> +// possible on those instructions, add them to the end of m_worklist.
> +bool
> +late_combine::combine_insn (insn_info *insn, insn_info *cursor)
> +{
> +  if (set_info *def = single_set_info (insn))
> +    if (rtx set = optimizable_set (def))
> +      return (combine_into_uses (def, set, cursor)
> +             || combine_cc_setter (def, set));
> +
> +  return false;
> +}
> +
>  // Run the pass on function FN.
>  unsigned int
>  late_combine::execute (function *fn)
> @@ -715,7 +882,7 @@ late_combine::execute (function *fn)
>        if (!insn->is_artificial ())
>         {
>           insn_info *prev = insn->prev_nondebug_insn ();
> -         if (combine_into_uses (insn, prev))
> +         if (combine_insn (insn, prev))
>             {
>               // Any instructions that get added to the worklist were
>               // previously after PREV.  Thus if we were able to move
> @@ -725,7 +892,7 @@ late_combine::execute (function *fn)
>               // the worklist should be free of backwards dependencies,
>               // even if it isn't necessarily in RPO.
>               for (unsigned int i = 0; i < m_worklist.length (); ++i)
> -               combine_into_uses (m_worklist[i], prev);
> +               combine_insn (m_worklist[i], prev);
>               m_worklist.truncate (0);
>               insn = prev;
>             }
> diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
> index 19b66456e45..63a1d08c46c 100644
> --- a/gcc/rtlanal.cc
> +++ b/gcc/rtlanal.cc
> @@ -5740,7 +5740,8 @@ pattern_cost (rtx pat, bool speed)
>           rtx x = XVECEXP (pat, 0, i);
>           if (GET_CODE (x) == SET)
>             {
> -             if (GET_CODE (SET_SRC (x)) == COMPARE)
> +             if (GET_CODE (SET_SRC (x)) == COMPARE
> +                 || GET_MODE_CLASS (GET_MODE (SET_DEST (x))) == MODE_CC)
>                 {
>                   if (comparison)
>                     return 0;
> --
> 2.43.0
>

Re: [PATCH] Merge aarch64-cc-fusion into late-combine

Reply via email to