> On 24 Apr 2025, at 09:17, Feng Xue OS <f...@os.amperecomputing.com> wrote:
> 
>> validate_ipa_reorder_locality_lto_partition (opts, opts_set);
> 
> I know this patch has already been merged into the trunk. But I think the 
> below piece of code change in opts.cc is questionable, it would completely 
> override any user-specified partition model, suppose that user wants a 
> traditional all-in-one lto compilation like "-flto-partition=none", without 
> "-fipa-reorder-for-locality".
> 
>> if (opts_set->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
>>  opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
>> LTO_PARTITION_BALANCED;
> 

Hmm, yes I think the condition should be == instead of !=. I’ll test a patch 
momentarily.
Thanks,
Kyrill

> Regards,
> Feng
> ________________________________________
> From: Kyrylo Tkachov <ktkac...@nvidia.com>
> Sent: Saturday, November 16, 2024 1:04 AM
> To: GCC Patches
> Cc: Jan Hubicka; Martin Jambor; Richard Biener
> Subject: [PATCH] Introduce -flto-partition=locality
> 
> Hi all,
> 
> This is a patch submission following-up from the RFC at:
> https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
> The patch is rebased and retested against current trunk, some debugging code
> removed, comments improved and some fixes added as I've we've done more
> testing.
> 
> ------------------------>8-----------------------------------------------------
> Implement partitioning and cloning in the callgraph to help locality.
> A new -flto-partition=locality flag is used to enable this.
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
> The optimization has two components:
> * Partitioning the callgraph so as to group callers and callees that 
> frequently
> call each other in the same partition
> * Cloning functions that straddle multiple callchains and allowing each clone
> to be local to the partition of its callchain.
> 
> The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
> It creates a partitioning plan and does the prerequisite cloning.
> The partitioning is then implemented during the existing LTO partitioning 
> pass.
> 
> To guide these locality heuristics we use PGO data.
> In the absence of PGO data we use a static heuristic that uses the accumulated
> estimated edge frequencies of the callees for each function to guide the
> reordering.
> We are investigating some more elaborate static heuristics, in particular 
> using
> the demangled C++ names to group template instantiatios together.
> This is promising but we are working out some kinks in the implementation
> currently and want to send that out as a follow-up once we're more confident
> in it.
> 
> A new bootstrap-lto-locality bootstrap config is added that allows us to test
> this on GCC itself with either static or PGO heuristics.
> GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).
> 
> With this optimization we are seeing good performance gains on some large
> internal workloads that stress the parts of the processor that is sensitive
> to code locality, but we'd appreciate wider performance evaluation.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for mainline?
> Thanks,
> Kyrill
> 
> Signed-off-by: Prachi Godbole <pgodb...@nvidia.com>
> Co-authored-by: Kyrylo Tkachov <ktkac...@nvidia.com>
> 
>    config/ChangeLog:
>             * bootstrap-lto-locality.mk: New file.
> 
>     gcc/ChangeLog:
>            * Makefile.in (OBJS): Add ipa-locality-cloning.o
>            (GTFILES): Add ipa-locality-cloning.cc dependency.
>            * common.opt (lto_partition_model): Add locality value.
>            * flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
> value.
>            (enum lto_locality_cloning_model): Define.
>            * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
> of node
>            and index.
>            * params.opt (lto_locality_cloning_model): New enum.
>            (lto-partition-locality-cloning): New param.
>            (lto-partition-locality-frequency-cutoff): Likewise.
>            (lto-partition-locality-size-cutoff): Likewise.
>            (lto-max-locality-partition): Likewise.
>            * passes.def: Add pass_ipa_locality_cloning.
>            * timevar.def (TV_IPA_LC): New timevar.
>            * tree-pass.h (make_pass_ipa_locality_cloning): Declare.
>            * ipa-locality-cloning.cc: New file.
>            * ipa-locality-cloning.h: New file.
> 
>      gcc/lto/ChangeLog:
>                 * lto-partition.cc: Include ipa-locality-cloning.h
>            (add_node_references_to_partition): Define.
>            (create_partition): Likewise.
>            (lto_locality_map): Likewise.
>            (lto_promote_cross_file_statics): Add extra dumping.
>            * lto-partition.h (lto_locality_map): Declare.
>            * lto.cc (do_whole_program_analysis): Handle 
> LTO_PARTITION_LOCALITY.
> 

Reply via email to