> -----Original Message-----
> From: Prathamesh Kulkarni <[email protected]>
> Sent: 13 October 2025 20:25
> To: Prathamesh Kulkarni <[email protected]>; gcc-
> [email protected]; Jan Hubicka <[email protected]>
> Subject: RE: [RFC] Enable time profile function reordering with
> AutoFDO
> 
> 
> 
> > -----Original Message-----
> > From: Prathamesh Kulkarni <[email protected]>
> > Sent: 06 October 2025 19:41
> > To: [email protected]; Jan Hubicka <[email protected]>
> > Subject: [RFC] Enable time profile function reordering with AutoFDO
> >
> > External email: Use caution opening links or attachments
> >
> >
> > Hi Honza,
> > The attached patch enables time profile based reordering with
> AutoFDO
> > with -fauto-profile -fprofile-reorder-functions, by mapping
> timestamps
> > obtained from perf into node->tp_first_run, and is based on top of
> > Dhruv's sourcefile tracking patch:
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-September/694800.html
> >
> > The rationale for doing this is:
> > (1) GCC already implements time-profile function reordering, the
> patch
> > enables it with AutoFDO.
> > (2) While time profile ordering is primarily meant for optimizing
> > startup time, we've also observed good effects on code-locality for
> > large internal workloads.
> > (3) Possibly useful for function reordering when accurate profile
> > annotation is hard with AutoFDO -- For eg, if branch samples are
> > missing (due to absence of LBR like structure).
> >
> > On AutoFDO tools side, I have a patch that extends gcov to emit 64-
> bit
> > perf timestamp that records first execution of function, which
> loosely
> > corresponds to PGO's time_profile counter.
> > The timestamp is stored adjacent to head field in toplevel function
> > info.
> > I will post a patch for this shortly on AutoFDO tools upstream repo.
> >
> > On GCC side, the patch makes the following changes:
> >
> > (1) Changes to auto-profile pass:
> > The patch adds a new field timestamp to function_instance, and
> > populates it in read_function_instance.
> >
> > It maintains a new timestamp_info_map from timestamp -> <name,
> > tp_first_run>, which maps timestamps sorted in ascending order to
> > (1..N), so lowest ordered timestamp is mapped to 1 and so on. The
> > rationale for this is that timestamps are 64-bit integers, and we
> > don't need the full 64-bit range for ordering by tp_first_run.
> >
> > During annotation, the timestamp associated with function_instance
> is
> > looked up in timestamp_info_map, and corresponding mapped value is
> > assigned to node->tp_first_run.
> >
> > (2) Handling clones:
> > Currently, for clones not registered in call graph before auto-
> profile
> > pass, the tp_first_run field is copied from original function, when
> > the clone is created.
> > However that may not correspond to the actual order of functions.
> >
> > For eg, if we have two profiled clones of foo:
> > foo.constprop.1, foo.constprop.2
> >
> > both will get same value for tp_first_run as foo->tp_first_run,
> which
> > might not correspond to time profile order.
> >
> > To address this, the patch introduces a new IPA pass
> > ipa_adjust_tp_first_run, that streams <clone name, tp_first_run>
> from
> > timestamp_info_map during LGEN, and during WPA reads it, and sets
> > clone's tp_first_run field accordingly.
> > The pass is placed pretty late (just before locality_cloning), by
> that
> > point clones would be registered in the call graph.
> >
> > Dhruv's sourcefile tracking patch already handles LTO privatized
> > functions.
> > The patch adds a (temporary) workaround for functions with
> > mismatched/empty filenames from gcov, to avoid getting dropped in
> > afdo_annotate_cfg by iterating thru all filenames in
> afdo_string_table
> > if get_function_instance_by_decl fails to find function_instance
> with
> > lbasename (DECL_SOURCE_FILE (decl)).
> >
> > (3) Grouping profiled functions together in as few partitions as
> > possible (preferably single).
> > The patch places profiled functions in time profile order together
> in
> > as few paritions as possible to get better advantage of code
> locality.
> > Unlike PGO, where every instrumented function gets a time profile
> > counter, with AutoFDO, the sampled functions are a fraction of the
> > total executed ones.
> > Similarly, in default_function_section, it overrides hot/cold
> > partitioning so that grouping of profiled functions isn't disrupted.
> >
> > (4) Option to disable profile driven opts.
> > The patch adds option -fauto-profile-reorder-only which only enables
> > time-profile reordering with AutoFDO (and disables profile driven
> > opts):
> > (a) Useful as a debugging aid to isolate regression to either
> function
> > reordering or profile driven opts.
> > (b) For our use case, it's also seemingly useful as a stopgap
> measure
> > to avoid regressions with AutoFDO profile driven opts, due to issues
> > with profile quality obtained with merging of SPE and non SPE
> > profiles.
> > We're actively working on resolving this.
> > (c) Possibly useful for architectures which do not support branch
> > sampling.
> > The option is disabled by default.
> >
> > Ideally, I would like to make it a param (and not user facing
> option),
> > but I am not able to control enabling/disabling options in
> > opts.cc:common_handle_option based on param value, will investigate
> > this further.
> >
> > * Results
> >
> > On one large interal workload, the patch (along with sourcefile
> > tracking patch), gives an uplift of 32.63% compared to LTO, and
> 8.07%
> > compared to LTO + AutoFDO trunk, and for another workload it gives
> an
> > uplift of 15.31% compared to LTO, and 7.76% compared to LTO +
> AutoFDO
> > trunk.
> > I will try benchmarking with SPEC2017.
> >
> > Will be grateful for suggestions on how to proceed further.
> Hi,
> ping: https://gcc.gnu.org/pipermail/gcc-patches/2025-
> October/696758.html
Hi,
ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2025-October/696758.html

Thanks,
Prathamesh
> 
> Thanks,
> Prathamesh
> >
> > Signed-off-by: Prathamesh Kulkarni <[email protected]>
> >
> > Thanks,
> > Prathamesh

Reply via email to