Hello Alex:
On 21/05/24 6:02 pm, Alex Coplan wrote:
> On 21/05/2024 16:02, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 21/05/24 1:16 am, Alex Coplan wrote:
>>> On 20/05/2024 18:44, Alex Coplan wrote:
>>>> Hi Ajit,
>>>>
>>>> On 20/05/2024 21:50, Ajit Agarwal wrote:
>>>>> Hello Alex/Richard:
>>>>>
>>>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>>>>> to support multiple targets.
>>>>>
>>>>> Common infrastructure of load store pair fusion is divided into
>>>>> target independent and target dependent code.
>>>>>
>>>>> Target independent code is structured in the following files.
>>>>> gcc/pair-fusion.h
>>>>> gcc/pair-fusion.cc
>>>>>
>>>>> Target independent code is the Generic code with pure virtual
>>>>> function to interface betwwen target independent and dependent
>>>>> code.
>>>>>
>>>>> Bootstrapped and regtested on aarch64-linux-gnu.
>>>>>
>>>>> Thanks & Regards
>>>>> Ajit
>>>>>
>>>>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
>>>>>
>>>>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>>>>> to support multiple targets.
>>>>>
>>>>> Common infrastructure of load store pair fusion is divided into
>>>>> target independent and target dependent code.
>>>>>
>>>>> Target independent code is structured in the following files.
>>>>> gcc/pair-fusion.h
>>>>> gcc/pair-fusion.cc
>>>>>
>>>>> Target independent code is the Generic code with pure virtual
>>>>> function to interface betwwen target independent and dependent
>>>>> code.
>>>>>
>>>>> 2024-05-20 Ajit Kumar Agarwal <[email protected]>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> * pair-fusion.h: Generic header code for load store fusion
>>>>
>>>> Insert "pair" before fusion?
>>
>> Addressed in v1 of the patch.
>>>>
>>>>> that can be shared across different architectures.
>>>>> * pair-fusion.cc: Generic source code implementation for
>>>>> load store fusion that can be shared across different architectures.
>>>>
>>>> Likewise.
>> Addressed in v1 of the patch.
>>>>
>>>>> * Makefile.in: Add new executable pair-fusion.o
>>>>
>>>> It's not an executable but an object file.
>>>>
>>>>> * config/aarch64/aarch64-ldp-fusion.cc: Target specific
>>>>> code for load store fusion of aarch64.
>>>>
>>>> I guess this should say something like: "Delete generic code and move it
>>>> to pair-fusion.cc in the middle-end."
>>>>
>>>> I've left some comments below on the header file. The rest of the patch
>>>> looks pretty good to me. I tried diffing the original contents of
>>>> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
>>>>
>>>
>>> <snip>
>>>
>>>>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
>>>>> new file mode 100644
>>>>> index 00000000000..00f6d3e149a
>>>>> --- /dev/null
>>>>> +++ b/gcc/pair-fusion.h
>>>>> @@ -0,0 +1,340 @@
>>>>> +// Pair Mem fusion generic header file.
>>>>> +// Copyright (C) 2024 Free Software Foundation, Inc.
>>>>> +//
>>>>> +// This file is part of GCC.
>>>>> +//
>>>>> +// GCC is free software; you can redistribute it and/or modify it
>>>>> +// under the terms of the GNU General Public License as published by
>>>>> +// the Free Software Foundation; either version 3, or (at your option)
>>>>> +// any later version.
>>>>> +//
>>>>> +// GCC is distributed in the hope that it will be useful, but
>>>>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>>>>> +// General Public License for more details.
>>>>> +//
>>>>> +// You should have received a copy of the GNU General Public License
>>>>> +// along with GCC; see the file COPYING3. If not see
>>>>> +// <http://www.gnu.org/licenses/>.
>>>>> +
>>>>> +#define INCLUDE_ALGORITHM
>>>>> +#define INCLUDE_FUNCTIONAL
>>>>> +#define INCLUDE_LIST
>>>>> +#define INCLUDE_TYPE_TRAITS
>>>>> +#include "config.h"
>>>>> +#include "system.h"
>>>>> +#include "coretypes.h"
>>>>> +#include "backend.h"
>>>>> +#include "rtl.h"
>>>>> +#include "df.h"
>>>>> +#include "rtl-iter.h"
>>>>> +#include "rtl-ssa.h"
>>>>
>>>> I'm not sure how desirable this is, but you might be able to
>>>> forward-declare RTL-SSA types like this:
>>>>
>>>> class def_info;
>>>> class insn_info;
>>>> class insn_range_info;
>>>>
>>>> thus removing the need to include the header here, since the interface
>>>> only refers to these types by pointer or reference.
>>>>
>>>> Richard: please say if you'd prefer keeping the include.
>>>>
>>
>> Doing forward declaration gives ambigous errors with conflicting
>> insn_info with rtl_ssa::insn_info and templated initialization
>> errors. Also with overloaded operator with insn_info is not
>> defined with forward declaration.
>
> So I tried this locally and it seems to work if you wrap the
> forward-decls in:
>
> namespace rtl_ssa {
> [...]
> };
>
> and indeed you'd need to move the definition of base_cand::viable () to
> pair-fusion.cc so that we don't dereference those pointers in the
> header (would be good to mark it inline if you do that).
>
> Btw, I noticed that the GCC coding conventions
> (https://gcc.gnu.org/codingconventions.html) say:
>
>> Header files should have neither using directives nor namespace-scope
>> using declarations.
>
> so we'll need to drop the using namespace rtl_ssa; from the header and
> add it to (at least) pair-fusion.cc.
>
> I think we'll also need to drop all of the inlcudes (and #define INCLUDE_*)
> from pair-fusion.h: Richard just reminded me that the GCC policy is that
> header files shouldn't include other files if they also contain code.
>
Addressed in v2 of the patch.
> Thanks,
> Alex
Thanks & Regards
Ajit
>
>>
>> Hence I kept the header.
>>
>> Addressed in v1 of the patch.
>>
>>>>> +#include "cfgcleanup.h"
>>>>> +#include "tree-pass.h"
>>>>> +#include "ordered-hash-map.h"
>>>>> +#include "tree-dfa.h"
>>>>> +#include "fold-const.h"
>>>>> +#include "tree-hash-traits.h"
>>>>> +#include "print-tree.h"
>>>>> +#include "insn-attr.h"
>>>>
>>>> I expect we don't need all of these includes here. I think we should
>>>> have the minimum necessary set of includes here and most of the includes
>>>> should be in the *.cc files.
>>>>
>>
>> Addressed in v1 of the patch.
>>>>> +
>>>>> +using namespace rtl_ssa;
>>>>> +
>>>>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
>>>>> +// (LFS) which we use as part of the key into the main hash tables.
>>>>> +//
>>>>> +// The idea is that we group candidates together only if they agree on
>>>>> +// the fields below. Candidates that disagree on any of these
>>>>> +// properties shouldn't be merged together.
>>>>> +struct lfs_fields
>>>>> +{
>>>>> + bool load_p;
>>>>> + bool fpsimd_p;
>>>>> + unsigned size;
>>>>> +};
>>>>
>>>> This struct shouldn't be needed in the header file (it should only be
>>>> needed in pair-fusion.cc). I can see that it's needed for
>>>> pair_fusion_bb_info, but that shouldn't be in the header file either IMO.
>>>>
>>
>> Addressed in v1 of the patch.
>>>>> +
>>>>> +using insn_list_t = std::list<insn_info *>;
>>>>
>>>> Likewise, if you move pair_fusion_bb_info out of this header, these
>>>> shouldn't be needed either.
>>>>
>>>>> +using insn_iter_t = insn_list_t::iterator;
>>>>
>>>> Pre-existing, but this looks to be completely dead/unused. Please could
>>>> you push the obvious patch to delete it from aarch64-ldp-fusion.cc
>>>> first (assuming that passes testing)?
>>>>
>>>>> +
>>>>> +// Information about the accesses at a given offset from a particular
>>>>> +// base. Stored in an access_group, see below.
>>>>> +struct access_record
>>>>> +{
>>>>> + poly_int64 offset;
>>>>> + std::list<insn_info *> cand_insns;
>>>>> + std::list<access_record>::iterator place;
>>>>> +
>>>>> + access_record (poly_int64 off) : offset (off) {}
>>>>> +};
>>>>
>>>> If you drop pair_fusion_bb_info from this header then this can go back
>>>> in pair-fusion.cc.
>>>>
>>
>> Addressed in v1 of the patch.
>>>>> +
>>>>> +// A group of accesses where adjacent accesses could be ldp/stp
>>>>> +// candidates. The splay tree supports efficient insertion,
>>>>> +// while the list supports efficient iteration.
>>>>> +struct access_group
>>>>> +{
>>>>> + splay_tree<access_record *> tree;
>>>>> + std::list<access_record> list;
>>>>> +
>>>>> + template<typename Alloc>
>>>>> + inline void track (Alloc node_alloc, poly_int64 offset, insn_info
>>>>> *insn);
>>>>> +};
>>>>
>>>> Likewise.
>>>>
>>
>> Addressed in v1 of the patch.
>>>>> +
>>>>> +// Information about a potential base candidate, used in try_fuse_pair.
>>>>> +// There may be zero, one, or two viable RTL bases for a given pair.
>>>>> +struct base_cand
>>>>> +{
>>>>> + // DEF is the def of the base register to be used by the pair.
>>>>> + def_info *def;
>>>>> +
>>>>> + // FROM_INSN is -1 if the base candidate is already shared by both
>>>>> + // candidate insns. Otherwise it holds the index of the insn from
>>>>> + // which the base originated.
>>>>> + //
>>>>> + // In the case that the base is shared, either DEF is already used
>>>>> + // by both candidate accesses, or both accesses see different versions
>>>>> + // of the same regno, in which case DEF is the def consumed by the
>>>>> + // first candidate access.
>>>>> + int from_insn;
>>>>> +
>>>>> + // To form a pair, we do so by moving the first access down and the
>>>>> second
>>>>> + // access up. To determine where to form the pair, and whether or not
>>>>> + // it is safe to form the pair, we track instructions which cannot be
>>>>> + // re-ordered past due to either dataflow or alias hazards.
>>>>> + //
>>>>> + // Since we allow changing the base used by an access, the choice of
>>>>> + // base can change which instructions act as re-ordering hazards for
>>>>> + // this pair (due to different dataflow). We store the initial
>>>>> + // dataflow hazards for this choice of base candidate in HAZARDS.
>>>>> + //
>>>>> + // These hazards act as re-ordering barriers to each candidate insn
>>>>> + // respectively, in program order.
>>>>> + //
>>>>> + // Later on, when we take alias analysis into account, we narrow
>>>>> + // HAZARDS accordingly.
>>>>> + insn_info *hazards[2];
>>>>> +
>>>>> + base_cand (def_info *def, int insn)
>>>>> + : def (def), from_insn (insn), hazards {nullptr, nullptr} {}
>>>>> +
>>>>> + base_cand (def_info *def) : base_cand (def, -1) {}
>>>>> +
>>>>> + // Test if this base candidate is viable according to HAZARDS.
>>>>> + bool viable () const
>>>>> + {
>>>>> + return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]);
>>>>> + }
>>>>> +};
>>>>> +
>>>>> +// Information about an alternate base. For a def_info D, it may
>>>>> +// instead be expressed as D = BASE + OFFSET.
>>>>> +struct alt_base
>>>>> +{
>>>>> + def_info *base;
>>>>> + poly_int64 offset;
>>>>> +};
>>>>
>>>> Likewise, this (alt_base) can be dropped from the header file.
>>>>
>>
>> Addressed in v1 of the patch.
>>>>> +
>>>>> +// Virtual base class for load/store walkers used in alias analysis.
>>>>> +struct alias_walker
>>>>> +{
>>>>> + virtual bool conflict_p (int &budget) const = 0;
>>>>> + virtual insn_info *insn () const = 0;
>>>>> + virtual bool valid () const = 0;
>>>>> + virtual void advance () = 0;
>>>>> +};
>>>>
>>>> This is quite minor, but I think since this class is internal to
>>>> pair-fusion.cc and the interface only uses pointers to this type,
>>>> this can just be a forward-declaration (keeping the definition
>>>> in pair-fusion.cc).
>>>>
>>>>> +
>>>>> +// When querying should_handle_writeback, this enum is used to
>>>>> +// qualify which opportunities we are asking about.
>>>>> +enum class writeback {
>>>>> + // Only those writeback opportunities that arise from existing
>>>>> + // auto-increment accesses.
>>>>> + EXISTING,
>>>>> +
>>>>> + // All writeback opportunities, including those that involve folding
>>>>> + // base register updates into a non-writeback pair.
>>>>> + ALL
>>>>> +};
>>>>> +
>>>>> +// This class can be overriden by targets to give a pass that fuses
>>>>> +// adjacent loads and stores into load/store pair instructions.
>>>>> +//
>>>>> +// The target can override the various virtual functions to customize
>>>>> +// the behaviour of the pass as appropriate for the target.
>>>>> +struct pair_fusion {
>>>>> + pair_fusion ();
>>>>> +
>>>>> + // Given:
>>>>> + // - an rtx REG_OP, the non-memory operand in a load/store insn,
>>>>> + // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
>>>>> + // - a boolean LOAD_P (true iff the insn is a load), then:
>>>>> + // return true if the access should be considered an FP/SIMD access.
>>>>> + // Such accesses are segregated from GPR accesses, since we only want
>>>>> + // to form pairs for accesses that use the same register file.
>>>>> + virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>>>>> + {
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + // Return true if we should consider forming pairs from memory
>>>>> + // accesses with operand mode MODE at this stage in compilation.
>>>>> + virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>>>>> +
>>>>> + // Return true iff REG_OP is a suitable register operand for a paired
>>>>> + // memory access, where LOAD_P is true if we're asking about loads and
>>>>> + // false for stores. MODE gives the mode of the operand.
>>>>> + virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
>>>>> + machine_mode mode) = 0;
>>>>> +
>>>>> + // Return alias check limit.
>>>>> + // This is needed to avoid unbounded quadratic behaviour when
>>>>> + // performing alias analysis.
>>>>> + virtual int pair_mem_alias_check_limit () = 0;
>>>>> +
>>>>> + // Return true if we should try to handle writeback opportunities.
>>>>> + // WHICH determines the kinds of writeback opportunities the caller
>>>>> + // is asking about.
>>>>> + virtual bool should_handle_writeback (enum writeback which) = 0;
>>>>> +
>>>>> + // Given BASE_MEM, the mem from the lower candidate access for a pair,
>>>>> + // and LOAD_P (true if the access is a load), check if we should
>>>>> proceed
>>>>> + // to form the pair given the target's code generation policy on
>>>>> + // paired accesses.
>>>>> + virtual bool pair_mem_ok_with_policy (rtx base_mem, bool load_p) = 0;
>>>>> +
>>>>> + // Generate the pattern for a paired access. PATS gives the patterns
>>>>> + // for the individual memory accesses (which by this point must share a
>>>>> + // common base register). If WRITEBACK is non-NULL, then this rtx
>>>>> + // describes the update to the base register that should be performed
>>>>> by
>>>>> + // the resulting insn. LOAD_P is true iff the accesses are loads.
>>>>> + virtual rtx gen_pair (rtx *pats, rtx writeback, bool load_p) = 0;
>>>>> +
>>>>> + // Return true if INSN is a paired memory access. If so, set LOAD_P to
>>>>> + // true iff INSN is a load pair.
>>>>> + virtual bool pair_mem_insn_p (rtx_insn *insn, bool &load_p) = 0;
>>>>> +
>>>>> + // Return true if we should track loads.
>>>>> + virtual bool track_loads_p ()
>>>>> + {
>>>>> + return true;
>>>>> + }
>>>>> +
>>>>> + // Return true if we should track stores.
>>>>> + virtual bool track_stores_p ()
>>>>> + {
>>>>> + return true;
>>>>> + }
>>>>> +
>>>>> + // Return true if OFFSET is in range for a paired memory access.
>>>>> + virtual bool pair_mem_in_range_p (HOST_WIDE_INT offset) = 0;
>>>>> +
>>>>> + // Given a load/store pair insn in PATTERN, unpack the insn, storing
>>>>> + // the register operands in REGS, and returning the mem. LOAD_P is
>>>>> + // true for loads and false for stores.
>>>>> + virtual rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) =
>>>>> 0;
>>>>> +
>>>>> + // Given a pair mem in MEM, register operands in REGS, and an rtx
>>>>> + // representing the effect of writeback on the base register in
>>>>> WB_EFFECT,
>>>>> + // return an insn representing a writeback variant of this pair.
>>>>> + // LOAD_P is true iff the pair is a load.
>>>>> + // This is used when promoting existing non-writeback pairs to
>>>>> writeback
>>>>> + // variants.
>>>>> + virtual rtx gen_promote_writeback_pair (rtx wb_effect, rtx mem,
>>>>> + rtx regs[2], bool load_p) = 0;
>>>>> +
>>>>> + void process_block (bb_info *bb);
>>>>> + inline insn_info *find_trailing_add (insn_info *insns[2],
>>>>> + const insn_range_info &pair_range,
>>>>> + int initial_writeback,
>>>>> + rtx *writeback_effect,
>>>>> + def_info **add_def,
>>>>> + def_info *base_def,
>>>>> + poly_int64 initial_offset,
>>>>> + unsigned access_size);
>>>>> + inline int get_viable_bases (insn_info *insns[2],
>>>>> + vec<base_cand> &base_cands,
>>>>> + rtx cand_mems[2],
>>>>> + unsigned access_size,
>>>>> + bool reversed);
>>>>> + inline void do_alias_analysis (insn_info *alias_hazards[4],
>>>>> + alias_walker *walkers[4],
>>>>> + bool load_p);
>>>>> + inline void try_promote_writeback (insn_info *insn, bool load_p);
>>>>> + void run ();
>>>>
>>>> Any reason why this patch drops the inline keyword here?
>>>
>>> Nevermind, of course this is the main entry point for the pass which now
>>> needs to be called from target code. Sorry for the noise.
>>>
>>> I guess we could mark process_block inline for consistency with the
>>> other such member functions, though.
>>>
>>
>> Addressed in v1 of the patch.
>>> Thanks,
>>> Alex
>>>
>>
>> Thanks & Regards
>> Ajit
>>>>
>>>>> + ~pair_fusion ();
>>>>> +};
>>>>> +
>>>>> +// State used by the pass for a given basic block.
>>>>> +struct pair_fusion_bb_info
>>>>> +{
>>>>
>>>> As mentioned above, I think this whole class can be dropped from the
>>>> header.
>>>>
>>>>> + using def_hash = nofree_ptr_hash<def_info>;
>>>>> + using expr_key_t = pair_hash<tree_operand_hash, int_hash<int, -1, -2>>;
>>>>> + using def_key_t = pair_hash<def_hash, int_hash<int, -1, -2>>;
>>>>> +
>>>>> + // Map of <tree base, LFS> -> access_group.
>>>>> + ordered_hash_map<expr_key_t, access_group> expr_map;
>>>>> +
>>>>> + // Map of <RTL-SSA def_info *, LFS> -> access_group.
>>>>> + ordered_hash_map<def_key_t, access_group> def_map;
>>>>> +
>>>>> + // Given the def_info for an RTL base register, express it as an
>>>>> offset from
>>>>> + // some canonical base instead.
>>>>> + //
>>>>> + // Canonicalizing bases in this way allows us to identify adjacent
>>>>> accesses
>>>>> + // even if they see different base register defs.
>>>>> + hash_map<def_hash, alt_base> canon_base_map;
>>>>> +
>>>>> + static const size_t obstack_alignment = sizeof (void *);
>>>>> +
>>>>> + pair_fusion_bb_info (bb_info *bb, pair_fusion *d)
>>>>> + : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
>>>>> + {
>>>>> + obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
>>>>> + obstack_alignment, obstack_chunk_alloc,
>>>>> + obstack_chunk_free);
>>>>> + }
>>>>> + ~pair_fusion_bb_info ()
>>>>> + {
>>>>> + obstack_free (&m_obstack, nullptr);
>>>>> +
>>>>> + if (m_emitted_tombstone)
>>>>> + {
>>>>> + bitmap_release (&m_tombstone_bitmap);
>>>>> + bitmap_obstack_release (&m_bitmap_obstack);
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + inline void track_access (insn_info *, bool load, rtx mem);
>>>>> + inline void transform ();
>>>>> + inline void cleanup_tombstones ();
>>>>> +
>>>>> +private:
>>>>> + obstack m_obstack;
>>>>> + bb_info *m_bb;
>>>>> + pair_fusion *m_pass;
>>>>> +
>>>>> + // State for keeping track of tombstone insns emitted for this BB.
>>>>> + bitmap_obstack m_bitmap_obstack;
>>>>> + bitmap_head m_tombstone_bitmap;
>>>>> + bool m_emitted_tombstone;
>>>>> +
>>>>> + inline splay_tree_node<access_record *> *node_alloc (access_record *);
>>>>> +
>>>>> + template<typename Map>
>>>>> + inline void traverse_base_map (Map &map);
>>>>> + inline void transform_for_base (int load_size, access_group &group);
>>>>> +
>>>>> + inline void merge_pairs (insn_list_t &, insn_list_t &,
>>>>> + bool load_p, unsigned access_size);
>>>>> +
>>>>> + inline bool try_fuse_pair (bool load_p, unsigned access_size,
>>>>> + insn_info *i1, insn_info *i2);
>>>>> +
>>>>> + inline bool fuse_pair (bool load_p, unsigned access_size,
>>>>> + int writeback,
>>>>> + insn_info *i1, insn_info *i2,
>>>>> + base_cand &base,
>>>>> + const insn_range_info &move_range);
>>>>> +
>>>>> + inline void track_tombstone (int uid);
>>>>> +
>>>>> + inline bool track_via_mem_expr (insn_info *, rtx mem, lfs_fields lfs);
>>>>> +};
>>>>> --
>>>>> 2.39.3
>>>>>
>>>>
>>>> Thanks,
>>>> Alex
>