> -----Original Message-----
> From: Prathamesh Kulkarni <[email protected]>
> Sent: 31 October 2025 00:44
> To: Prathamesh Kulkarni <[email protected]>; gcc-
> [email protected]; Jan Hubicka <[email protected]>
> Subject: RE: [RFC] Enable time profile function reordering with
> AutoFDO
>
>
>
> > -----Original Message-----
> > From: Prathamesh Kulkarni <[email protected]>
> > Sent: 23 October 2025 10:39
> > To: [email protected]; Jan Hubicka <[email protected]>
> > Subject: RE: [RFC] Enable time profile function reordering with
> > AutoFDO
> >
> > External email: Use caution opening links or attachments
> >
> >
> > > -----Original Message-----
> > > From: Prathamesh Kulkarni <[email protected]>
> > > Sent: 13 October 2025 20:25
> > > To: Prathamesh Kulkarni <[email protected]>; gcc-
> > > [email protected]; Jan Hubicka <[email protected]>
> > > Subject: RE: [RFC] Enable time profile function reordering with
> > > AutoFDO
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Prathamesh Kulkarni <[email protected]>
> > > > Sent: 06 October 2025 19:41
> > > > To: [email protected]; Jan Hubicka <[email protected]>
> > > > Subject: [RFC] Enable time profile function reordering with
> > AutoFDO
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > Hi Honza,
> > > > The attached patch enables time profile based reordering with
> > > AutoFDO
> > > > with -fauto-profile -fprofile-reorder-functions, by mapping
> > > timestamps
> > > > obtained from perf into node->tp_first_run, and is based on top
> of
> > > > Dhruv's sourcefile tracking patch:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2025-
> > September/694800.html
> > > >
> > > > The rationale for doing this is:
> > > > (1) GCC already implements time-profile function reordering, the
> > > patch
> > > > enables it with AutoFDO.
> > > > (2) While time profile ordering is primarily meant for
> optimizing
> > > > startup time, we've also observed good effects on code-locality
> > for
> > > > large internal workloads.
> > > > (3) Possibly useful for function reordering when accurate
> profile
> > > > annotation is hard with AutoFDO -- For eg, if branch samples are
> > > > missing (due to absence of LBR like structure).
> > > >
> > > > On AutoFDO tools side, I have a patch that extends gcov to emit
> > 64-
> > > bit
> > > > perf timestamp that records first execution of function, which
> > > loosely
> > > > corresponds to PGO's time_profile counter.
> > > > The timestamp is stored adjacent to head field in toplevel
> > function
> > > > info.
> > > > I will post a patch for this shortly on AutoFDO tools upstream
> > repo.
> > > >
> > > > On GCC side, the patch makes the following changes:
> > > >
> > > > (1) Changes to auto-profile pass:
> > > > The patch adds a new field timestamp to function_instance, and
> > > > populates it in read_function_instance.
> > > >
> > > > It maintains a new timestamp_info_map from timestamp -> <name,
> > > > tp_first_run>, which maps timestamps sorted in ascending order
> to
> > > > (1..N), so lowest ordered timestamp is mapped to 1 and so on.
> The
> > > > rationale for this is that timestamps are 64-bit integers, and
> we
> > > > don't need the full 64-bit range for ordering by tp_first_run.
> > > >
> > > > During annotation, the timestamp associated with
> function_instance
> > > is
> > > > looked up in timestamp_info_map, and corresponding mapped value
> is
> > > > assigned to node->tp_first_run.
> > > >
> > > > (2) Handling clones:
> > > > Currently, for clones not registered in call graph before auto-
> > > profile
> > > > pass, the tp_first_run field is copied from original function,
> > when
> > > > the clone is created.
> > > > However that may not correspond to the actual order of
> functions.
> > > >
> > > > For eg, if we have two profiled clones of foo:
> > > > foo.constprop.1, foo.constprop.2
> > > >
> > > > both will get same value for tp_first_run as foo->tp_first_run,
> > > which
> > > > might not correspond to time profile order.
> > > >
> > > > To address this, the patch introduces a new IPA pass
> > > > ipa_adjust_tp_first_run, that streams <clone name, tp_first_run>
> > > from
> > > > timestamp_info_map during LGEN, and during WPA reads it, and
> sets
> > > > clone's tp_first_run field accordingly.
> > > > The pass is placed pretty late (just before locality_cloning),
> by
> > > that
> > > > point clones would be registered in the call graph.
> > > >
> > > > Dhruv's sourcefile tracking patch already handles LTO privatized
> > > > functions.
> > > > The patch adds a (temporary) workaround for functions with
> > > > mismatched/empty filenames from gcov, to avoid getting dropped
> in
> > > > afdo_annotate_cfg by iterating thru all filenames in
> > > afdo_string_table
> > > > if get_function_instance_by_decl fails to find function_instance
> > > with
> > > > lbasename (DECL_SOURCE_FILE (decl)).
> > > >
> > > > (3) Grouping profiled functions together in as few partitions as
> > > > possible (preferably single).
> > > > The patch places profiled functions in time profile order
> together
> > > in
> > > > as few paritions as possible to get better advantage of code
> > > locality.
> > > > Unlike PGO, where every instrumented function gets a time
> profile
> > > > counter, with AutoFDO, the sampled functions are a fraction of
> the
> > > > total executed ones.
> > > > Similarly, in default_function_section, it overrides hot/cold
> > > > partitioning so that grouping of profiled functions isn't
> > disrupted.
> > > >
> > > > (4) Option to disable profile driven opts.
> > > > The patch adds option -fauto-profile-reorder-only which only
> > enables
> > > > time-profile reordering with AutoFDO (and disables profile
> driven
> > > > opts):
> > > > (a) Useful as a debugging aid to isolate regression to either
> > > function
> > > > reordering or profile driven opts.
> > > > (b) For our use case, it's also seemingly useful as a stopgap
> > > measure
> > > > to avoid regressions with AutoFDO profile driven opts, due to
> > issues
> > > > with profile quality obtained with merging of SPE and non SPE
> > > > profiles.
> > > > We're actively working on resolving this.
> > > > (c) Possibly useful for architectures which do not support
> branch
> > > > sampling.
> > > > The option is disabled by default.
> > > >
> > > > Ideally, I would like to make it a param (and not user facing
> > > option),
> > > > but I am not able to control enabling/disabling options in
> > > > opts.cc:common_handle_option based on param value, will
> > investigate
> > > > this further.
> > > >
> > > > * Results
> > > >
> > > > On one large interal workload, the patch (along with sourcefile
> > > > tracking patch), gives an uplift of 32.63% compared to LTO, and
> > > 8.07%
> > > > compared to LTO + AutoFDO trunk, and for another workload it
> gives
> > > an
> > > > uplift of 15.31% compared to LTO, and 7.76% compared to LTO +
> > > AutoFDO
> > > > trunk.
> > > > I will try benchmarking with SPEC2017.
> > > >
> > > > Will be grateful for suggestions on how to proceed further.
> > > Hi,
> > > ping: https://gcc.gnu.org/pipermail/gcc-patches/2025-
> > > October/696758.html
> > Hi,
> > ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2025-
> > October/696758.html
> Hi,
> ping * 3: https://gcc.gnu.org/pipermail/gcc-patches/2025-
> October/696758.html
Hi,
ping * 4: https://gcc.gnu.org/pipermail/gcc-patches/2025-October/696758.html
Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Signed-off-by: Prathamesh Kulkarni <[email protected]>
> > > >
> > > > Thanks,
> > > > Prathamesh