Hello Alex: On 21/05/24 1:16 am, Alex Coplan wrote: > On 20/05/2024 18:44, Alex Coplan wrote: >> Hi Ajit, >> >> On 20/05/2024 21:50, Ajit Agarwal wrote: >>> Hello Alex/Richard: >>> >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >>> to support multiple targets. >>> >>> Common infrastructure of load store pair fusion is divided into >>> target independent and target dependent code. >>> >>> Target independent code is structured in the following files. >>> gcc/pair-fusion.h >>> gcc/pair-fusion.cc >>> >>> Target independent code is the Generic code with pure virtual >>> function to interface betwwen target independent and dependent >>> code. >>> >>> Bootstrapped and regtested on aarch64-linux-gnu. >>> >>> Thanks & Regards >>> Ajit >>> >>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end >>> >>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end >>> to support multiple targets. >>> >>> Common infrastructure of load store pair fusion is divided into >>> target independent and target dependent code. >>> >>> Target independent code is structured in the following files. >>> gcc/pair-fusion.h >>> gcc/pair-fusion.cc >>> >>> Target independent code is the Generic code with pure virtual >>> function to interface betwwen target independent and dependent >>> code. >>> >>> 2024-05-20 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >>> >>> gcc/ChangeLog: >>> >>> * pair-fusion.h: Generic header code for load store fusion >> >> Insert "pair" before fusion?
Addressed in v1 of the patch. >> >>> that can be shared across different architectures. >>> * pair-fusion.cc: Generic source code implementation for >>> load store fusion that can be shared across different architectures. >> >> Likewise. Addressed in v1 of the patch. >> >>> * Makefile.in: Add new executable pair-fusion.o >> >> It's not an executable but an object file. >> >>> * config/aarch64/aarch64-ldp-fusion.cc: Target specific >>> code for load store fusion of aarch64. >> >> I guess this should say something like: "Delete generic code and move it >> to pair-fusion.cc in the middle-end." >> >> I've left some comments below on the header file. The rest of the patch >> looks pretty good to me. I tried diffing the original contents of >> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected. >> > > <snip> > >>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h >>> new file mode 100644 >>> index 00000000000..00f6d3e149a >>> --- /dev/null >>> +++ b/gcc/pair-fusion.h >>> @@ -0,0 +1,340 @@ >>> +// Pair Mem fusion generic header file. >>> +// Copyright (C) 2024 Free Software Foundation, Inc. >>> +// >>> +// This file is part of GCC. >>> +// >>> +// GCC is free software; you can redistribute it and/or modify it >>> +// under the terms of the GNU General Public License as published by >>> +// the Free Software Foundation; either version 3, or (at your option) >>> +// any later version. >>> +// >>> +// GCC is distributed in the hope that it will be useful, but >>> +// WITHOUT ANY WARRANTY; without even the implied warranty of >>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> +// General Public License for more details. >>> +// >>> +// You should have received a copy of the GNU General Public License >>> +// along with GCC; see the file COPYING3. If not see >>> +// <http://www.gnu.org/licenses/>. >>> + >>> +#define INCLUDE_ALGORITHM >>> +#define INCLUDE_FUNCTIONAL >>> +#define INCLUDE_LIST >>> +#define INCLUDE_TYPE_TRAITS >>> +#include "config.h" >>> +#include "system.h" >>> +#include "coretypes.h" >>> +#include "backend.h" >>> +#include "rtl.h" >>> +#include "df.h" >>> +#include "rtl-iter.h" >>> +#include "rtl-ssa.h" >> >> I'm not sure how desirable this is, but you might be able to >> forward-declare RTL-SSA types like this: >> >> class def_info; >> class insn_info; >> class insn_range_info; >> >> thus removing the need to include the header here, since the interface >> only refers to these types by pointer or reference. >> >> Richard: please say if you'd prefer keeping the include. >> Doing forward declaration gives ambigous errors with conflicting insn_info with rtl_ssa::insn_info and templated initialization errors. Also with overloaded operator with insn_info is not defined with forward declaration. Hence I kept the header. Addressed in v1 of the patch. >>> +#include "cfgcleanup.h" >>> +#include "tree-pass.h" >>> +#include "ordered-hash-map.h" >>> +#include "tree-dfa.h" >>> +#include "fold-const.h" >>> +#include "tree-hash-traits.h" >>> +#include "print-tree.h" >>> +#include "insn-attr.h" >> >> I expect we don't need all of these includes here. I think we should >> have the minimum necessary set of includes here and most of the includes >> should be in the *.cc files. >> Addressed in v1 of the patch. >>> + >>> +using namespace rtl_ssa; >>> + >>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer >>> +// (LFS) which we use as part of the key into the main hash tables. >>> +// >>> +// The idea is that we group candidates together only if they agree on >>> +// the fields below. Candidates that disagree on any of these >>> +// properties shouldn't be merged together. >>> +struct lfs_fields >>> +{ >>> + bool load_p; >>> + bool fpsimd_p; >>> + unsigned size; >>> +}; >> >> This struct shouldn't be needed in the header file (it should only be >> needed in pair-fusion.cc). I can see that it's needed for >> pair_fusion_bb_info, but that shouldn't be in the header file either IMO. >> Addressed in v1 of the patch. >>> + >>> +using insn_list_t = std::list<insn_info *>; >> >> Likewise, if you move pair_fusion_bb_info out of this header, these >> shouldn't be needed either. >> >>> +using insn_iter_t = insn_list_t::iterator; >> >> Pre-existing, but this looks to be completely dead/unused. Please could >> you push the obvious patch to delete it from aarch64-ldp-fusion.cc >> first (assuming that passes testing)? >> >>> + >>> +// Information about the accesses at a given offset from a particular >>> +// base. Stored in an access_group, see below. >>> +struct access_record >>> +{ >>> + poly_int64 offset; >>> + std::list<insn_info *> cand_insns; >>> + std::list<access_record>::iterator place; >>> + >>> + access_record (poly_int64 off) : offset (off) {} >>> +}; >> >> If you drop pair_fusion_bb_info from this header then this can go back >> in pair-fusion.cc. >> Addressed in v1 of the patch. >>> + >>> +// A group of accesses where adjacent accesses could be ldp/stp >>> +// candidates. The splay tree supports efficient insertion, >>> +// while the list supports efficient iteration. >>> +struct access_group >>> +{ >>> + splay_tree<access_record *> tree; >>> + std::list<access_record> list; >>> + >>> + template<typename Alloc> >>> + inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn); >>> +}; >> >> Likewise. >> Addressed in v1 of the patch. >>> + >>> +// Information about a potential base candidate, used in try_fuse_pair. >>> +// There may be zero, one, or two viable RTL bases for a given pair. >>> +struct base_cand >>> +{ >>> + // DEF is the def of the base register to be used by the pair. >>> + def_info *def; >>> + >>> + // FROM_INSN is -1 if the base candidate is already shared by both >>> + // candidate insns. Otherwise it holds the index of the insn from >>> + // which the base originated. >>> + // >>> + // In the case that the base is shared, either DEF is already used >>> + // by both candidate accesses, or both accesses see different versions >>> + // of the same regno, in which case DEF is the def consumed by the >>> + // first candidate access. >>> + int from_insn; >>> + >>> + // To form a pair, we do so by moving the first access down and the >>> second >>> + // access up. To determine where to form the pair, and whether or not >>> + // it is safe to form the pair, we track instructions which cannot be >>> + // re-ordered past due to either dataflow or alias hazards. >>> + // >>> + // Since we allow changing the base used by an access, the choice of >>> + // base can change which instructions act as re-ordering hazards for >>> + // this pair (due to different dataflow). We store the initial >>> + // dataflow hazards for this choice of base candidate in HAZARDS. >>> + // >>> + // These hazards act as re-ordering barriers to each candidate insn >>> + // respectively, in program order. >>> + // >>> + // Later on, when we take alias analysis into account, we narrow >>> + // HAZARDS accordingly. >>> + insn_info *hazards[2]; >>> + >>> + base_cand (def_info *def, int insn) >>> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {} >>> + >>> + base_cand (def_info *def) : base_cand (def, -1) {} >>> + >>> + // Test if this base candidate is viable according to HAZARDS. >>> + bool viable () const >>> + { >>> + return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]); >>> + } >>> +}; >>> + >>> +// Information about an alternate base. For a def_info D, it may >>> +// instead be expressed as D = BASE + OFFSET. >>> +struct alt_base >>> +{ >>> + def_info *base; >>> + poly_int64 offset; >>> +}; >> >> Likewise, this (alt_base) can be dropped from the header file. >> Addressed in v1 of the patch. >>> + >>> +// Virtual base class for load/store walkers used in alias analysis. >>> +struct alias_walker >>> +{ >>> + virtual bool conflict_p (int &budget) const = 0; >>> + virtual insn_info *insn () const = 0; >>> + virtual bool valid () const = 0; >>> + virtual void advance () = 0; >>> +}; >> >> This is quite minor, but I think since this class is internal to >> pair-fusion.cc and the interface only uses pointers to this type, >> this can just be a forward-declaration (keeping the definition >> in pair-fusion.cc). >> >>> + >>> +// When querying should_handle_writeback, this enum is used to >>> +// qualify which opportunities we are asking about. >>> +enum class writeback { >>> + // Only those writeback opportunities that arise from existing >>> + // auto-increment accesses. >>> + EXISTING, >>> + >>> + // All writeback opportunities, including those that involve folding >>> + // base register updates into a non-writeback pair. >>> + ALL >>> +}; >>> + >>> +// This class can be overriden by targets to give a pass that fuses >>> +// adjacent loads and stores into load/store pair instructions. >>> +// >>> +// The target can override the various virtual functions to customize >>> +// the behaviour of the pass as appropriate for the target. >>> +struct pair_fusion { >>> + pair_fusion (); >>> + >>> + // Given: >>> + // - an rtx REG_OP, the non-memory operand in a load/store insn, >>> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and >>> + // - a boolean LOAD_P (true iff the insn is a load), then: >>> + // return true if the access should be considered an FP/SIMD access. >>> + // Such accesses are segregated from GPR accesses, since we only want >>> + // to form pairs for accesses that use the same register file. >>> + virtual bool fpsimd_op_p (rtx, machine_mode, bool) >>> + { >>> + return false; >>> + } >>> + >>> + // Return true if we should consider forming pairs from memory >>> + // accesses with operand mode MODE at this stage in compilation. >>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0; >>> + >>> + // Return true iff REG_OP is a suitable register operand for a paired >>> + // memory access, where LOAD_P is true if we're asking about loads and >>> + // false for stores. MODE gives the mode of the operand. >>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op, >>> + machine_mode mode) = 0; >>> + >>> + // Return alias check limit. >>> + // This is needed to avoid unbounded quadratic behaviour when >>> + // performing alias analysis. >>> + virtual int pair_mem_alias_check_limit () = 0; >>> + >>> + // Return true if we should try to handle writeback opportunities. >>> + // WHICH determines the kinds of writeback opportunities the caller >>> + // is asking about. >>> + virtual bool should_handle_writeback (enum writeback which) = 0; >>> + >>> + // Given BASE_MEM, the mem from the lower candidate access for a pair, >>> + // and LOAD_P (true if the access is a load), check if we should proceed >>> + // to form the pair given the target's code generation policy on >>> + // paired accesses. >>> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0; >>> + >>> + // Generate the pattern for a paired access. PATS gives the patterns >>> + // for the individual memory accesses (which by this point must share a >>> + // common base register). If WRITEBACK is non-NULL, then this rtx >>> + // describes the update to the base register that should be performed by >>> + // the resulting insn. LOAD_P is true iff the accesses are loads. >>> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0; >>> + >>> + // Return true if INSN is a paired memory access. If so, set LOAD_P to >>> + // true iff INSN is a load pair. >>> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0; >>> + >>> + // Return true if we should track loads. >>> + virtual bool track_loads_p () >>> + { >>> + return true; >>> + } >>> + >>> + // Return true if we should track stores. >>> + virtual bool track_stores_p () >>> + { >>> + return true; >>> + } >>> + >>> + // Return true if OFFSET is in range for a paired memory access. >>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0; >>> + >>> + // Given a load/store pair insn in PATTERN, unpack the insn, storing >>> + // the register operands in REGS, and returning the mem. LOAD_P is >>> + // true for loads and false for stores. >>> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) = 0; >>> + >>> + // Given a pair mem in MEM, register operands in REGS, and an rtx >>> + // representing the effect of writeback on the base register in >>> WB_EFFECT, >>> + // return an insn representing a writeback variant of this pair. >>> + // LOAD_P is true iff the pair is a load. >>> + // This is used when promoting existing non-writeback pairs to writeback >>> + // variants. >>> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem, >>> + rtx regs[2], bool load_p) = 0; >>> + >>> + void process_block (bb_info *bb); >>> + inline insn_info *find_trailing_add (insn_info *insns[2], >>> + const insn_range_info &pair_range, >>> + int initial_writeback, >>> + rtx *writeback_effect, >>> + def_info **add_def, >>> + def_info *base_def, >>> + poly_int64 initial_offset, >>> + unsigned access_size); >>> + inline int get_viable_bases (insn_info *insns[2], >>> + vec<base_cand> &base_cands, >>> + rtx cand_mems[2], >>> + unsigned access_size, >>> + bool reversed); >>> + inline void do_alias_analysis (insn_info *alias_hazards[4], >>> + alias_walker *walkers[4], >>> + bool load_p); >>> + inline void try_promote_writeback (insn_info *insn, bool load_p); >>> + void run (); >> >> Any reason why this patch drops the inline keyword here? > > Nevermind, of course this is the main entry point for the pass which now > needs to be called from target code. Sorry for the noise. > > I guess we could mark process_block inline for consistency with the > other such member functions, though. > Addressed in v1 of the patch. > Thanks, > Alex > Thanks & Regards Ajit >> >>> + ~pair_fusion (); >>> +}; >>> + >>> +// State used by the pass for a given basic block. >>> +struct pair_fusion_bb_info >>> +{ >> >> As mentioned above, I think this whole class can be dropped from the >> header. >> >>> + using def_hash = nofree_ptr_hash<def_info>; >>> + using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>; >>> + using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>; >>> + >>> + // Map of <tree base, LFS> -> access_group. >>> + ordered_hash_map<expr_key_t, access_group> expr_map; >>> + >>> + // Map of <RTL-SSA def_info *, LFS> -> access_group. >>> + ordered_hash_map<def_key_t, access_group> def_map; >>> + >>> + // Given the def_info for an RTL base register, express it as an offset >>> from >>> + // some canonical base instead. >>> + // >>> + // Canonicalizing bases in this way allows us to identify adjacent >>> accesses >>> + // even if they see different base register defs. >>> + hash_map<def_hash, alt_base> canon_base_map; >>> + >>> + static const size_t obstack_alignment = sizeof (void *); >>> + >>> + pair_fusion_bb_info (bb_info *bb, pair_fusion *d) >>> + : m_bb (bb), m_pass (d), m_emitted_tombstone (false) >>> + { >>> + obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE, >>> + obstack_alignment, obstack_chunk_alloc, >>> + obstack_chunk_free); >>> + } >>> + ~pair_fusion_bb_info () >>> + { >>> + obstack_free (&m_obstack, nullptr); >>> + >>> + if (m_emitted_tombstone) >>> + { >>> + bitmap_release (&m_tombstone_bitmap); >>> + bitmap_obstack_release (&m_bitmap_obstack); >>> + } >>> + } >>> + >>> + inline void track_access (insn_info *, bool load, rtx mem); >>> + inline void transform (); >>> + inline void cleanup_tombstones (); >>> + >>> +private: >>> + obstack m_obstack; >>> + bb_info *m_bb; >>> + pair_fusion *m_pass; >>> + >>> + // State for keeping track of tombstone insns emitted for this BB. >>> + bitmap_obstack m_bitmap_obstack; >>> + bitmap_head m_tombstone_bitmap; >>> + bool m_emitted_tombstone; >>> + >>> + inline splay_tree_node<access_record *> *node_alloc (access_record *); >>> + >>> + template<typename Map> >>> + inline void traverse_base_map (Map &map); >>> + inline void transform_for_base (int load_size, access_group &group); >>> + >>> + inline void merge_pairs (insn_list_t &, insn_list_t &, >>> + bool load_p, unsigned access_size); >>> + >>> + inline bool try_fuse_pair (bool load_p, unsigned access_size, >>> + insn_info *i1, insn_info *i2); >>> + >>> + inline bool fuse_pair (bool load_p, unsigned access_size, >>> + int writeback, >>> + insn_info *i1, insn_info *i2, >>> + base_cand &base, >>> + const insn_range_info &move_range); >>> + >>> + inline void track_tombstone (int uid); >>> + >>> + inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs); >>> +}; >>> -- >>> 2.39.3 >>> >> >> Thanks, >> Alex