On 16/06/25 21:44, Jan Hubicka wrote:
External email: Use caution opening links or attachments
Hi,
Introduction
------------
Per PR120229 (gcc.gnu.org/PR120229), the auto-profile pass cannot distinguish
profile information for `function_instance's with the same base name, when
suffixes are removed. To fix this, source file names should be tracked in the
GCOV file information to disambiguate functions. This issue occurs when
privatized clones are created for an LTO partition, when there are
static functions that have the same name in the same partition.
Is this really LTO specific? I.e. if I have two translation units with
static function of the same name, the resulting binary will still have
two symbols of the same name. Are those resolved somehow?
Ah, you're right, I misunderstood private functions vs. privatization.
Yeah, this does also occur with a shared library that gets linked in, having the
same names for static functions.
Also I wonder how the create_gcov side of patches are handled. I.e. how
to contribute patches there?
We will be contributing patches to create_gcov as well, this is just an RFC
to get initial feedback. The plan is to land the changes for both at the
same time.
Looking on LLVM, the file format is documented in
include/llvm/ProfileData/SampleProfReader.h
curiously it does not seem to implement what you do (which makes sense
to me and I think we should implement it), but there are some other
things that are interesting. For example
// DETAILED_SUMMARY
// A list of detailed summary entry. Each entry consists of
// CUTOFF (uint32_t)
// Required percentile of total sample count expressed as a fraction
// multiplied by 1000000.
// MIN_COUNT (uint64_t)
// The minimum number of samples required to reach the target
// CUTOFF.
// NUM_COUNTS (uint64_t)
// Number of samples to get to the desrired percentile.
seems like useful info to handle autoFDO 0s more orrectly, so it would
be nice to add it to the gcov formal version 3.
(overall gcov seems poor name since it originates from Gnu COVerage,
because the usual profile feedback started as coverage tool gcov).
I can take a look at this.
Proposed solution
-----------------
1. In the string_table section of the GCOV file, each function name will have
the source file-name that it came from written after it, in sequence. The
current layout of the file is:
GCOV_TAG_AFDO_FILE_NAMES
<function 1> <function 2> ...
With this change the layout becomes:
GCOV_TAG_AFDO_FILE_NAMES
<file name 1> <file name 2> ...
<function 1> <file 1 idx> <function 2> <file 2 idx> ...
2. AUTO_PROFILE_VERSION will be increased from 2 to 3 as this is a breaking
change to the GCOV file format used by AutoFDO.
A patch is attached with this RFC for a prototype implementation. There
is an open question here: What about backwards compatibility? Should a lack of
source file-name information be handled in the code (to keep supporting version
2)?
It seems to me that there is not much benefit from supporting older
versions of GCOV formats at least at this stage when auto-FDO is not
very widespread with GCC. I think when updating GCC it is also natural
to update create_gcov. One problem is that --gcov-version needs to be
manually specifid to create_gcov invocation that is not really OK since
that would require update in all build machineries each time new GCC is
releatsed.
Ideally we should have create_gcov in-tree and shipped with GCC (like
gcov and gcov-dump), but with current tool that has many dependencies
and source tree of llvm this is not going to happen.
Perhaps we can have gcc --gcov-version which will print the version and
instruct users to do --gcov-version=`gcc --gcov-version` even tough this
sucks.
There is also the issue that PGO's GCOV version is different from AutoFDO's,
so the flag will have to be something like `--afdo-gcov-version` which seems
worse. So I see two paths here:
1. Add an option to print the GCOV version that create_gcov can use
2. Set a default version in create_gcov so that the option is not compulsory
anymore
1. is easier to implement but 2. is imo better though it will require
synchronization between the projects. But I guess that is required anyways
to make sure the file formats match between the two.
Limitations
-----------
As stated in PR120229, source files with the same name will still be broken.
It may be worth adding more of the path into the information to disambiguate
this case.
Other limitation is that paths and file names do not need to be stable
between builds and profile may get lost in this case without user
noticing or understanding what happens. Siilar problem exists with
-fprofile-use, thought it is bit more visible, since we print warning
about missing profile.
I don't see better solution that what you do tough...
Would it be a good idea to try and emit a warning if a profile is not found
for a function because the file names don't match? This can be noisy / have
false-positives though if there are multiple entries of the same function
name.
Maybe there could be a warning for entries in the GCOV profile which were never
accessed?
Honza
Bootstrapped and regtested on aarch64-linux-gnu.
Dhruv Chawla (1):
[RFC][AutoFDO] Source filename tracking in GCOV
gcc/auto-profile.cc | 101 ++++++++++++++++++++++++++++++----
gcc/testsuite/lib/profopt.exp | 2 +-
2 files changed, 91 insertions(+), 12 deletions(-)
--
2.44.0
--
Regards,
Dhruv