Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

Ajit Agarwal Tue, 21 May 2024 03:33:19 -0700

Hello Alex:

On 21/05/24 1:16 am, Alex Coplan wrote:
> On 20/05/2024 18:44, Alex Coplan wrote:
>> Hi Ajit,
>>
>> On 20/05/2024 21:50, Ajit Agarwal wrote:
>>> Hello Alex/Richard:
>>>
>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>>> to support multiple targets.
>>>
>>> Common infrastructure of load store pair fusion is divided into
>>> target independent and target dependent code.
>>>
>>> Target independent code is structured in the following files.
>>> gcc/pair-fusion.h
>>> gcc/pair-fusion.cc
>>>
>>> Target independent code is the Generic code with pure virtual
>>> function to interface betwwen target independent and dependent
>>> code.
>>>
>>> Bootstrapped and regtested on aarch64-linux-gnu.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
>>>
>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>>> to support multiple targets.
>>>
>>> Common infrastructure of load store pair fusion is divided into
>>> target independent and target dependent code.
>>>
>>> Target independent code is structured in the following files.
>>> gcc/pair-fusion.h
>>> gcc/pair-fusion.cc
>>>
>>> Target independent code is the Generic code with pure virtual
>>> function to interface betwwen target independent and dependent
>>> code.
>>>
>>> 2024-05-20  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
>>>
>>> gcc/ChangeLog:
>>>
>>>     * pair-fusion.h: Generic header code for load store fusion
>>
>> Insert "pair" before fusion?


Addressed in v1 of the patch.
>>
>>>     that can be shared across different architectures.
>>>     * pair-fusion.cc: Generic source code implementation for
>>>     load store fusion that can be shared across different architectures.
>>
>> Likewise.
Addressed in v1 of the patch.
>>
>>>     * Makefile.in: Add new executable pair-fusion.o
>>
>> It's not an executable but an object file.
>>
>>>     * config/aarch64/aarch64-ldp-fusion.cc: Target specific
>>>     code for load store fusion of aarch64.
>>
>> I guess this should say something like: "Delete generic code and move it
>> to pair-fusion.cc in the middle-end."
>>
>> I've left some comments below on the header file.  The rest of the patch
>> looks pretty good to me.  I tried diffing the original contents of
>> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
>>
> 
> <snip>
> 
>>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
>>> new file mode 100644
>>> index 00000000000..00f6d3e149a
>>> --- /dev/null
>>> +++ b/gcc/pair-fusion.h
>>> @@ -0,0 +1,340 @@
>>> +// Pair Mem fusion generic header file.
>>> +// Copyright (C) 2024 Free Software Foundation, Inc.
>>> +//
>>> +// This file is part of GCC.
>>> +//
>>> +// GCC is free software; you can redistribute it and/or modify it
>>> +// under the terms of the GNU General Public License as published by
>>> +// the Free Software Foundation; either version 3, or (at your option)
>>> +// any later version.
>>> +//
>>> +// GCC is distributed in the hope that it will be useful, but
>>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> +// General Public License for more details.
>>> +//
>>> +// You should have received a copy of the GNU General Public License
>>> +// along with GCC; see the file COPYING3.  If not see
>>> +// <http://www.gnu.org/licenses/>.
>>> +
>>> +#define INCLUDE_ALGORITHM
>>> +#define INCLUDE_FUNCTIONAL
>>> +#define INCLUDE_LIST
>>> +#define INCLUDE_TYPE_TRAITS
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "backend.h"
>>> +#include "rtl.h"
>>> +#include "df.h"
>>> +#include "rtl-iter.h"
>>> +#include "rtl-ssa.h"
>>
>> I'm not sure how desirable this is, but you might be able to
>> forward-declare RTL-SSA types like this:
>>
>> class def_info;
>> class insn_info;
>> class insn_range_info;
>>
>> thus removing the need to include the header here, since the interface
>> only refers to these types by pointer or reference.
>>
>> Richard: please say if you'd prefer keeping the include.
>>

Doing forward declaration gives ambigous errors with conflicting
insn_info with rtl_ssa::insn_info and templated initialization
errors. Also with overloaded operator with insn_info is not 
defined with forward declaration.

Hence I kept the header.

Addressed in v1 of the patch.

>>> +#include "cfgcleanup.h"
>>> +#include "tree-pass.h"
>>> +#include "ordered-hash-map.h"
>>> +#include "tree-dfa.h"
>>> +#include "fold-const.h"
>>> +#include "tree-hash-traits.h"
>>> +#include "print-tree.h"
>>> +#include "insn-attr.h"
>>
>> I expect we don't need all of these includes here.  I think we should
>> have the minimum necessary set of includes here and most of the includes
>> should be in the *.cc files.
>>

Addressed in v1 of the patch.
>>> +
>>> +using namespace rtl_ssa;
>>> +
>>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
>>> +// (LFS) which we use as part of the key into the main hash tables.
>>> +//
>>> +// The idea is that we group candidates together only if they agree on
>>> +// the fields below.  Candidates that disagree on any of these
>>> +// properties shouldn't be merged together.
>>> +struct lfs_fields
>>> +{
>>> +  bool load_p;
>>> +  bool fpsimd_p;
>>> +  unsigned size;
>>> +};
>>
>> This struct shouldn't be needed in the header file (it should only be
>> needed in pair-fusion.cc).  I can see that it's needed for
>> pair_fusion_bb_info, but that shouldn't be in the header file either IMO.
>>

Addressed in v1 of the patch.
>>> +
>>> +using insn_list_t = std::list<insn_info *>;
>>
>> Likewise, if you move pair_fusion_bb_info out of this header, these
>> shouldn't be needed either.
>>
>>> +using insn_iter_t = insn_list_t::iterator;
>>
>> Pre-existing, but this looks to be completely dead/unused.  Please could
>> you push the obvious patch to delete it from aarch64-ldp-fusion.cc
>> first (assuming that passes testing)?
>>
>>> +
>>> +// Information about the accesses at a given offset from a particular
>>> +// base.  Stored in an access_group, see below.
>>> +struct access_record
>>> +{
>>> +  poly_int64 offset;
>>> +  std::list<insn_info *> cand_insns;
>>> +  std::list<access_record>::iterator place;
>>> +
>>> +  access_record (poly_int64 off) : offset (off) {}
>>> +};
>>
>> If you drop pair_fusion_bb_info from this header then this can go back
>> in pair-fusion.cc.
>>

Addressed in v1 of the patch.
>>> +
>>> +// A group of accesses where adjacent accesses could be ldp/stp
>>> +// candidates.  The splay tree supports efficient insertion,
>>> +// while the list supports efficient iteration.
>>> +struct access_group
>>> +{
>>> +  splay_tree<access_record *> tree;
>>> +  std::list<access_record> list;
>>> +
>>> +  template<typename Alloc>
>>> +  inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn);
>>> +};
>>
>> Likewise.
>>

Addressed in v1 of the patch.
>>> +
>>> +// Information about a potential base candidate, used in try_fuse_pair.
>>> +// There may be zero, one, or two viable RTL bases for a given pair.
>>> +struct base_cand
>>> +{
>>> +  // DEF is the def of the base register to be used by the pair.
>>> +  def_info *def;
>>> +
>>> +  // FROM_INSN is -1 if the base candidate is already shared by both
>>> +  // candidate insns.  Otherwise it holds the index of the insn from
>>> +  // which the base originated.
>>> +  //
>>> +  // In the case that the base is shared, either DEF is already used
>>> +  // by both candidate accesses, or both accesses see different versions
>>> +  // of the same regno, in which case DEF is the def consumed by the
>>> +  // first candidate access.
>>> +  int from_insn;
>>> +
>>> +  // To form a pair, we do so by moving the first access down and the 
>>> second
>>> +  // access up.  To determine where to form the pair, and whether or not
>>> +  // it is safe to form the pair, we track instructions which cannot be
>>> +  // re-ordered past due to either dataflow or alias hazards.
>>> +  //
>>> +  // Since we allow changing the base used by an access, the choice of
>>> +  // base can change which instructions act as re-ordering hazards for
>>> +  // this pair (due to different dataflow).  We store the initial
>>> +  // dataflow hazards for this choice of base candidate in HAZARDS.
>>> +  //
>>> +  // These hazards act as re-ordering barriers to each candidate insn
>>> +  // respectively, in program order.
>>> +  //
>>> +  // Later on, when we take alias analysis into account, we narrow
>>> +  // HAZARDS accordingly.
>>> +  insn_info *hazards[2];
>>> +
>>> +  base_cand (def_info *def, int insn)
>>> +    : def (def), from_insn (insn), hazards {nullptr, nullptr} {}
>>> +
>>> +  base_cand (def_info *def) : base_cand (def, -1) {}
>>> +
>>> +  // Test if this base candidate is viable according to HAZARDS.
>>> +  bool viable () const
>>> +  {
>>> +    return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]);
>>> +  }
>>> +};
>>> +
>>> +// Information about an alternate base.  For a def_info D, it may
>>> +// instead be expressed as D = BASE + OFFSET.
>>> +struct alt_base
>>> +{
>>> +  def_info *base;
>>> +  poly_int64 offset;
>>> +};
>>
>> Likewise, this (alt_base) can be dropped from the header file.
>>

Addressed in v1 of the patch.
>>> +
>>> +// Virtual base class for load/store walkers used in alias analysis.
>>> +struct alias_walker
>>> +{
>>> +  virtual bool conflict_p (int &budget) const = 0;
>>> +  virtual insn_info *insn () const = 0;
>>> +  virtual bool valid () const = 0;
>>> +  virtual void advance () = 0;
>>> +};
>>
>> This is quite minor, but I think since this class is internal to
>> pair-fusion.cc and the interface only uses pointers to this type,
>> this can just be a forward-declaration (keeping the definition
>> in pair-fusion.cc).
>>
>>> +
>>> +// When querying should_handle_writeback, this enum is used to
>>> +// qualify which opportunities we are asking about.
>>> +enum class writeback {
>>> +  // Only those writeback opportunities that arise from existing
>>> +  // auto-increment accesses.
>>> +  EXISTING,
>>> +
>>> +  // All writeback opportunities, including those that involve folding
>>> +  // base register updates into a non-writeback pair.
>>> +  ALL
>>> +};
>>> +
>>> +// This class can be overriden by targets to give a pass that fuses
>>> +// adjacent loads and stores into load/store pair instructions.
>>> +//
>>> +// The target can override the various virtual functions to customize
>>> +// the behaviour of the pass as appropriate for the target.
>>> +struct pair_fusion {
>>> +  pair_fusion ();
>>> +
>>> +  // Given:
>>> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
>>> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
>>> +  // - a boolean LOAD_P (true iff the insn is a load), then:
>>> +  // return true if the access should be considered an FP/SIMD access.
>>> +  // Such accesses are segregated from GPR accesses, since we only want
>>> +  // to form pairs for accesses that use the same register file.
>>> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>>> +  {
>>> +    return false;
>>> +  }
>>> +
>>> +  // Return true if we should consider forming pairs from memory
>>> +  // accesses with operand mode MODE at this stage in compilation.
>>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>>> +
>>> +  // Return true iff REG_OP is a suitable register operand for a paired
>>> +  // memory access, where LOAD_P is true if we're asking about loads and
>>> +  // false for stores.  MODE gives the mode of the operand.
>>> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>>> +                                 machine_mode mode) = 0;
>>> +
>>> +  // Return alias check limit.
>>> +  // This is needed to avoid unbounded quadratic behaviour when
>>> +  // performing alias analysis.
>>> +  virtual int pair_mem_alias_check_limit () = 0;
>>> +
>>> +  // Return true if we should try to handle writeback opportunities.
>>> +  // WHICH determines the kinds of writeback opportunities the caller
>>> +  // is asking about.
>>> +  virtual bool should_handle_writeback (enum writeback which) = 0;
>>> +
>>> +  // Given BASE_MEM, the mem from the lower candidate access for a pair,
>>> +  // and LOAD_P (true if the access is a load), check if we should proceed
>>> +  // to form the pair given the target's code generation policy on
>>> +  // paired accesses.
>>> +  virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
>>> +
>>> +  // Generate the pattern for a paired access.  PATS gives the patterns
>>> +  // for the individual memory accesses (which by this point must share a
>>> +  // common base register).  If WRITEBACK is non-NULL, then this rtx
>>> +  // describes the update to the base register that should be performed by
>>> +  // the resulting insn.  LOAD_P is true iff the accesses are loads.
>>> +  virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
>>> +
>>> +  // Return true if INSN is a paired memory access.  If so, set LOAD_P to
>>> +  // true iff INSN is a load pair.
>>> +  virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
>>> +
>>> +  // Return true if we should track loads.
>>> +  virtual bool track_loads_p ()
>>> +  {
>>> +    return true;
>>> +  }
>>> +
>>> +  // Return true if we should track stores.
>>> +  virtual bool track_stores_p ()
>>> +  {
>>> +    return true;
>>> +  }
>>> +
>>> +  // Return true if OFFSET is in range for a paired memory access.
>>> +  virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
>>> +
>>> +  // Given a load/store pair insn in PATTERN, unpack the insn, storing
>>> +  // the register operands in REGS, and returning the mem.  LOAD_P is
>>> +  // true for loads and false for stores.
>>> +  virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0;
>>> +
>>> +  // Given a pair mem in MEM, register operands in REGS, and an rtx
>>> +  // representing the effect of writeback on the base register in 
>>> WB_EFFECT,
>>> +  // return an insn representing a writeback variant of this pair.
>>> +  // LOAD_P is true iff the pair is a load.
>>> +  // This is used when promoting existing non-writeback pairs to writeback
>>> +  // variants.
>>> +  virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
>>> +                                     rtx regs[2], bool load_p) = 0;
>>> +
>>> +  void process_block (bb_info *bb);
>>> +  inline insn_info *find_trailing_add (insn_info *insns[2],
>>> +                                  const insn_range_info &pair_range,
>>> +                                  int initial_writeback,
>>> +                                  rtx *writeback_effect,
>>> +                                  def_info **add_def,
>>> +                                  def_info *base_def,
>>> +                                  poly_int64 initial_offset,
>>> +                                  unsigned access_size);
>>> +  inline int get_viable_bases (insn_info *insns[2],
>>> +                          vec<base_cand> &base_cands,
>>> +                          rtx cand_mems[2],
>>> +                          unsigned access_size,
>>> +                          bool reversed);
>>> +  inline void do_alias_analysis (insn_info *alias_hazards[4],
>>> +                            alias_walker *walkers[4],
>>> +                            bool load_p);
>>> +  inline void try_promote_writeback (insn_info *insn, bool load_p);
>>> +  void run ();
>>
>> Any reason why this patch drops the inline keyword here?
> 
> Nevermind, of course this is the main entry point for the pass which now
> needs to be called from target code.  Sorry for the noise.
> 
> I guess we could mark process_block inline for consistency with the
> other such member functions, though.
> 

Addressed in v1 of the patch.
> Thanks,
> Alex
> 

Thanks & Regards
Ajit
>>
>>> +  ~pair_fusion ();
>>> +};
>>> +
>>> +// State used by the pass for a given basic block.
>>> +struct pair_fusion_bb_info
>>> +{
>>
>> As mentioned above, I think this whole class can be dropped from the
>> header.
>>
>>> +  using def_hash = nofree_ptr_hash<def_info>;
>>> +  using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>;
>>> +  using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>;
>>> +
>>> +  // Map of <tree base, LFS> -> access_group.
>>> +  ordered_hash_map<expr_key_t, access_group> expr_map;
>>> +
>>> +  // Map of <RTL-SSA def_info *, LFS> -> access_group.
>>> +  ordered_hash_map<def_key_t, access_group> def_map;
>>> +
>>> +  // Given the def_info for an RTL base register, express it as an offset 
>>> from
>>> +  // some canonical base instead.
>>> +  //
>>> +  // Canonicalizing bases in this way allows us to identify adjacent 
>>> accesses
>>> +  // even if they see different base register defs.
>>> +  hash_map<def_hash, alt_base> canon_base_map;
>>> +
>>> +  static const size_t obstack_alignment = sizeof (void *);
>>> +
>>> +  pair_fusion_bb_info (bb_info *bb, pair_fusion *d)
>>> +    : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
>>> +  {
>>> +    obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>>> +                           obstack_alignment, obstack_chunk_alloc,
>>> +                           obstack_chunk_free);
>>> +  }
>>> +  ~pair_fusion_bb_info ()
>>> +  {
>>> +    obstack_free (&m_obstack, nullptr);
>>> +
>>> +    if (m_emitted_tombstone)
>>> +      {
>>> +   bitmap_release (&m_tombstone_bitmap);
>>> +   bitmap_obstack_release (&m_bitmap_obstack);
>>> +      }
>>> +  }
>>> +
>>> +  inline void track_access (insn_info *, bool load, rtx mem);
>>> +  inline void transform ();
>>> +  inline void cleanup_tombstones ();
>>> +
>>> +private:
>>> +  obstack m_obstack;
>>> +  bb_info *m_bb;
>>> +  pair_fusion *m_pass;
>>> +
>>> +  // State for keeping track of tombstone insns emitted for this BB.
>>> +  bitmap_obstack m_bitmap_obstack;
>>> +  bitmap_head m_tombstone_bitmap;
>>> +  bool m_emitted_tombstone;
>>> +
>>> +  inline splay_tree_node<access_record *> *node_alloc (access_record *);
>>> +
>>> +  template<typename Map>
>>> +  inline void traverse_base_map (Map &map);
>>> +  inline void transform_for_base (int load_size, access_group &group);
>>> +
>>> +  inline void merge_pairs (insn_list_t &, insn_list_t &,
>>> +                      bool load_p, unsigned access_size);
>>> +
>>> +  inline bool try_fuse_pair (bool load_p, unsigned access_size,
>>> +                        insn_info *i1, insn_info *i2);
>>> +
>>> +  inline bool fuse_pair (bool load_p, unsigned access_size,
>>> +                    int writeback,
>>> +                    insn_info *i1, insn_info *i2,
>>> +                    base_cand &base,
>>> +                    const insn_range_info &move_range);
>>> +
>>> +  inline void track_tombstone (int uid);
>>> +
>>> +  inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>>> +};
>>> -- 
>>> 2.39.3
>>>
>>
>> Thanks,
>> Alex

Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

Reply via email to