> validate_ipa_reorder_locality_lto_partition (opts, opts_set);

I know this patch has already been merged into the trunk. But I think the below 
piece of code change in opts.cc is questionable, it would completely override 
any user-specified partition model, suppose that user wants a traditional 
all-in-one lto compilation like "-flto-partition=none", without 
"-fipa-reorder-for-locality".

> if (opts_set->x_flag_lto_partition != LTO_PARTITION_DEFAULT)
>   opts_set->x_flag_lto_partition = opts->x_flag_lto_partition = 
> LTO_PARTITION_BALANCED;

Regards,
Feng
________________________________________
From: Kyrylo Tkachov <ktkac...@nvidia.com>
Sent: Saturday, November 16, 2024 1:04 AM
To: GCC Patches
Cc: Jan Hubicka; Martin Jambor; Richard Biener
Subject: [PATCH] Introduce -flto-partition=locality

Hi all,

This is a patch submission following-up from the RFC at:
https://gcc.gnu.org/pipermail/gcc/2024-November/245076.html
The patch is rebased and retested against current trunk, some debugging code
removed, comments improved and some fixes added as I've we've done more
testing.

------------------------>8-----------------------------------------------------
Implement partitioning and cloning in the callgraph to help locality.
A new -flto-partition=locality flag is used to enable this.
The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc
The optimization has two components:
* Partitioning the callgraph so as to group callers and callees that frequently
call each other in the same partition
* Cloning functions that straddle multiple callchains and allowing each clone
to be local to the partition of its callchain.

The majority of the logic is in the new IPA pass in ipa-locality-cloning.cc.
It creates a partitioning plan and does the prerequisite cloning.
The partitioning is then implemented during the existing LTO partitioning pass.

To guide these locality heuristics we use PGO data.
In the absence of PGO data we use a static heuristic that uses the accumulated
estimated edge frequencies of the callees for each function to guide the
reordering.
We are investigating some more elaborate static heuristics, in particular using
the demangled C++ names to group template instantiatios together.
This is promising but we are working out some kinks in the implementation
currently and want to send that out as a follow-up once we're more confident
in it.

A new bootstrap-lto-locality bootstrap config is added that allows us to test
this on GCC itself with either static or PGO heuristics.
GCC bootstraps with both (normal LTO bootstrap and profiledbootstrap).

With this optimization we are seeing good performance gains on some large
internal workloads that stress the parts of the processor that is sensitive
to code locality, but we'd appreciate wider performance evaluation.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for mainline?
Thanks,
Kyrill

Signed-off-by: Prachi Godbole <pgodb...@nvidia.com>
Co-authored-by: Kyrylo Tkachov <ktkac...@nvidia.com>

    config/ChangeLog:
             * bootstrap-lto-locality.mk: New file.

     gcc/ChangeLog:
            * Makefile.in (OBJS): Add ipa-locality-cloning.o
            (GTFILES): Add ipa-locality-cloning.cc dependency.
            * common.opt (lto_partition_model): Add locality value.
            * flag-types.h (lto_partition_model): Add LTO_PARTITION_LOCALITY 
value.
            (enum lto_locality_cloning_model): Define.
            * lto-cgraph.cc (lto_set_symtab_encoder_in_partition): Add dumping 
of node
            and index.
            * params.opt (lto_locality_cloning_model): New enum.
            (lto-partition-locality-cloning): New param.
            (lto-partition-locality-frequency-cutoff): Likewise.
            (lto-partition-locality-size-cutoff): Likewise.
            (lto-max-locality-partition): Likewise.
            * passes.def: Add pass_ipa_locality_cloning.
            * timevar.def (TV_IPA_LC): New timevar.
            * tree-pass.h (make_pass_ipa_locality_cloning): Declare.
            * ipa-locality-cloning.cc: New file.
            * ipa-locality-cloning.h: New file.

      gcc/lto/ChangeLog:
                 * lto-partition.cc: Include ipa-locality-cloning.h
            (add_node_references_to_partition): Define.
            (create_partition): Likewise.
            (lto_locality_map): Likewise.
            (lto_promote_cross_file_statics): Add extra dumping.
            * lto-partition.h (lto_locality_map): Declare.
            * lto.cc (do_whole_program_analysis): Handle LTO_PARTITION_LOCALITY.

Reply via email to