Hi Honza, > On 8 Jul 2025, at 2:26 am, Jan Hubicka <hubi...@ucw.cz> wrote: > > External email: Use caution opening links or attachments > > > Hi, > as discussed also on the autofdo pull request, LLVM solves the same > problem using -funique-internal-linkage-names > https://reviews.llvm.org/D73307 > > All non-public functions gets theis symbol renamed from > <function_name>.__uniq.<md5 sum of source file name in decadic>
How is __uniq.<md5 sum of source file name in decadic> added to static symbols in the profile? Thanks, Kugan > Decadic is used since demanglers special case numerical suffixes. > In addition debug info of all functions get DW_AT_linkage_name > with this linkage name as attribute. > > This makes it possible to uniquely match symbol names in the > gcov profile with symbol table at -fauto-profile readback time. > > Implementing the gory details I see that this solution is bit > of a hack. I would say > > Pros > - it is not always good idea to reinvent wheel (but differently) > - it is transparent to current implemention of create_gcov > - by being mostly compatible with LLVM we simplify tooling > of projects that uses both GCC and LLVM as compilers. > Moreover we keep possibility of both LLVM and GCC consuming > same profile data eventually which can be good for projects > that ship them in release bacpages > Cons > - it does not work with toplevel assembler (since it renames the > actual symbol). This may be solved at gas side by syntactic aliases > if supported. > - it makes symbol names longer making all other tools, not only > auto-fdo to process them in symbol tables and debug info > > Conceptually it is working around limitation of create_gcov dwarf > consumer for sure (it is also mentioned in the review at LLVM side > linked above). > > Learning about details, I think the orignal patchset to extend gcov > format by file names has few issues, too. In partiuclar > - it is important to add file names only to function symbols that are > not public, so having both public and private symbols of the same > name as necessary > - As discussed in autofdo pull request > https://github.com/google/autofdo/pull/244 > we can just combine file name and symbol name into existing name > entry. Somehting like filename/symbol name > - We can not use filename of source contianing the function but instead > name of the translation unit. > It is common for inline headers to contain static functions that > would get mixed up as well. > But those are fixable. > > gcc/ChangeLog: > > * auto-profile.cc (string_table::get_index_by_decl): > If necessary add unique suffix. > (function_instance::get_cgraph_node): If necessary strip > unique suffix. > (match_with_target): If necessary add unique suffix. > (autofdo_source_profile::offline_external_functions): > Set uses_unique_names; if unique names are on be more > strict on mathing > * common.opt: (funique-internal-linkage-names): Add > * dwarf2out.cc: Include ipa-utils.h > (add_linkage_attr): Add unique suffix if needed. > (add_linkage_name): Output linkage names of non-public > functions. > * ipa-utils.h (decl_unique_linkage_name_suffix): Declare. > (decl_unique_linkage_name): Declare. > (unique_linkage_name_suffix_start): Declare. > * ipa-visibility.cc: Include md5.h. > (decl_unique_linkage_name_suffix): New function. > (decl_unique_linkage_name): New function. > (unique_linkage_name_suffix_start): New function. > (function_and_variable_visibility): Compute suffixes > if necessary. > > diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc > index 1700bf8f2cd..f1ef18b8a65 100644 > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3. If not see > #include "auto-profile.h" > #include "tree-pretty-print.h" > #include "gimple-pretty-print.h" > +#include "ipa-utils.h" > > /* The following routines implements AutoFDO optimization. > > @@ -508,6 +509,7 @@ public: > create () > { > autofdo_source_profile *map = new autofdo_source_profile (); > + map->uses_unique_names_ = false; > > if (map->read ()) > return map; > @@ -551,6 +553,18 @@ public: > void offline_external_functions (); > > void offline_unrealized_inlines (); > + > + bool > + uses_unique_names () > + { > + return uses_unique_names_; > + } > + > + void > + set_uses_unique_names () > + { > + uses_unique_names_ = 1; > + } > private: > /* Map from function_instance name index (in string_table) to > function_instance. */ > @@ -569,6 +583,8 @@ private: > name_function_instance_map map_; > > auto_vec <function_instance *> duplicate_functions_; > + > + bool uses_unique_names_; > }; > > /* Store the strings read from the profile data file. */ > @@ -818,6 +834,18 @@ int > string_table::get_index_by_decl (tree decl) const > { > const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + if (afdo_source_profile->uses_unique_names () > + && (!flag_unique_internal_linkage_names > + || !symtab->function_flags_ready) > + && !TREE_PUBLIC (decl)) > + { > + /* TODO: Building unique names for each lookup is not necessary. > + Build table translating decls and indexes just once. */ > + char *unique_name = decl_unique_linkage_name (decl); > + int ret = get_index (unique_name); > + free (unique_name); > + return ret; > + } > int ret = get_index (name); > if (ret != -1) > return ret; > @@ -880,13 +908,41 @@ function_instance::~function_instance () > cgraph_node * > function_instance::get_cgraph_node () > { > - for (symtab_node *n = cgraph_node::get_for_asmname > - (get_identifier > - (afdo_string_table->get_name (name ()))); > - n; n = n->next_sharing_asm_name) > + const char *sname = afdo_string_table->get_name (name ()); > + char *nonunique_name = NULL; > + const char *suffix; > + > + if ((!symtab->function_flags_ready > + || !flag_unique_internal_linkage_names) > + && afdo_source_profile->uses_unique_names () > + && (suffix = unique_linkage_name_suffix_start (sname))) > + { > + /* Is it local function in this unit? */ > + if (strncmp (suffix, decl_unique_linkage_name_suffix (), > + decl_unique_linkage_name_suffix_len)) > + return NULL; > + nonunique_name = XNEWVEC (char, suffix - sname + 1); > + memcpy (nonunique_name, sname, suffix - sname); > + nonunique_name[suffix - sname] = 0; > + sname = nonunique_name; > + } > + symtab_node *n = cgraph_node::get_for_asmname (get_identifier (sname)); > + if (nonunique_name) > + free (nonunique_name); > + for (;n; n = n->next_sharing_asm_name) > if (cgraph_node *cn = dyn_cast <cgraph_node *> (n)) > if (cn->definition && cn->has_gimple_body_p ()) > - return cn; > + { > + /* Do not confuse private symbol clasing with public > + symbols. */ > + if (afdo_source_profile->uses_unique_names () > + && (!symtab->function_flags_ready > + || !flag_unique_internal_linkage_names) > + && !nonunique_name > + && !TREE_PUBLIC (cn->decl)) > + return NULL; > + return cn; > + } > return NULL; > } > > @@ -1129,9 +1185,15 @@ function_instance::offline_if_in_set (name_index_set > &seen, > static int > match_with_target (gimple *stmt, function_instance *inlined_fn, cgraph_node > *n) > { > + char *name_to_free = NULL; > const char *symbol_name = IDENTIFIER_POINTER > (DECL_ASSEMBLER_NAME (n->decl)); > const char *name = afdo_string_table->get_name (inlined_fn->name ()); > + if (!TREE_PUBLIC (n->decl) > + && (!flag_unique_internal_linkage_names > + || !symtab->function_flags_ready) > + && afdo_source_profile->uses_unique_names ()) > + symbol_name = name_to_free = decl_unique_linkage_name (n->decl); > if (strcmp (name, symbol_name)) > { > int i; > @@ -1144,10 +1206,11 @@ match_with_target (gimple *stmt, function_instance > *inlined_fn, cgraph_node *n) > in_suffix = true; > } > /* Accept dwarf names and stripped suffixes. */ > - if (!strcmp (lang_hooks.dwarf_name (n->decl, 0), > - afdo_string_table->get_name (inlined_fn->name ())) > - || (!name[i] && symbol_name[i] == '.') > - || in_suffix) > + if (afdo_source_profile->uses_unique_names () > + && (!strcmp (lang_hooks.dwarf_name (n->decl, 0), > + afdo_string_table->get_name (inlined_fn->name ())) > + || (!name[i] && symbol_name[i] == '.') > + || in_suffix)) > { > int index = afdo_string_table->get_index (symbol_name); > if (index == -1) > @@ -1163,8 +1226,12 @@ match_with_target (gimple *stmt, function_instance > *inlined_fn, cgraph_node *n) > "auto-profile contains inlined " > "function with symbol name %s instead of symbol name %s", > name, symbol_name); > + if (name_to_free) > + free (name_to_free); > return 0; > } > + if (name_to_free) > + free (name_to_free); > return 1; > } > > @@ -1892,6 +1986,9 @@ autofdo_source_profile::offline_external_functions () > { > const char *n1 = afdo_string_table->get_name (i); > char *n2 = get_original_name (n1); > + if (!afdo_source_profile->uses_unique_names () > + && unique_linkage_name_suffix_start (n1)) > + afdo_source_profile->set_uses_unique_names (); > if (!strcmp (n1, n2)) > { > free (n2); > @@ -1924,13 +2021,28 @@ autofdo_source_profile::offline_external_functions () > { > const char *name > = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)); > - const char *dwarf_name = lang_hooks.dwarf_name (node->decl, 0); > - int index = afdo_string_table->get_index (name); > + const char *dwarf_name > + = afdo_source_profile->uses_unique_names () > + ? NULL : lang_hooks.dwarf_name (node->decl, 0); > + char *unique_name = NULL; > + int index; > + > + if (!afdo_source_profile->uses_unique_names () > + || TREE_PUBLIC (node->decl) > + || (flag_unique_internal_linkage_names > + && symtab->function_flags_ready)) > + index = afdo_string_table->get_index (name); > + else > + { > + name = unique_name = decl_unique_linkage_name (node->decl); > + index = afdo_string_table->get_index (unique_name); > + } > > /* Inline function may be identified by its dwarf names; > rename them to symbol names. With LTO dwarf names are > lost in free_lange_data. */ > - if (strcmp (name, dwarf_name)) > + if (!afdo_source_profile->uses_unique_names () > + && strcmp (name, dwarf_name)) > { > int index2 = afdo_string_table->get_index (dwarf_name); > if (index2 != -1) > @@ -1960,13 +2072,21 @@ autofdo_source_profile::offline_external_functions () > { > if (dump_file) > { > - fprintf (dump_file, > - "Node %s not in auto profile (%s neither %s)\n", > - node->dump_name (), > - name, > - dwarf_name); > + if (dwarf_name) > + fprintf (dump_file, > + "Node %s not in auto profile (%s neither %s)\n", > + node->dump_name (), > + name, > + dwarf_name); > + else > + fprintf (dump_file, > + "Node %s (symbol %s) not in auto profile\n", > + node->dump_name (), > + name); > } > } > + if (unique_name) > + free (unique_name); > } > > for (auto iter : to_symbol_name) > diff --git a/gcc/common.opt b/gcc/common.opt > index 3d65656e658..6bda709f708 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -3329,6 +3329,10 @@ Common Var(flag_unconstrained_commons) Optimization > Assume common declarations may be overridden with ones with a larger > trailing array. > > +funique-internal-linkage-names > +Common Var(flag_unique_internal_linkage_names) > +Output unique names to debug info > + > funit-at-a-time > Common Var(flag_unit_at_a_time) Init(1) > Compile whole compilation unit at a time. > diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc > index d1a55dbcbcb..004829be666 100644 > --- a/gcc/dwarf2out.cc > +++ b/gcc/dwarf2out.cc > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3. If not see > #include "rtl-iter.h" > #include "stringpool.h" > #include "attribs.h" > +#include "ipa-utils.h" > #include "file-prefix-map.h" /* remap_debug_filename() */ > > static void dwarf2out_source_line (unsigned int, unsigned int, const char *, > @@ -22165,6 +22166,16 @@ static void > add_linkage_attr (dw_die_ref die, tree decl) > { > const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + char *name_to_free = NULL; > + > + /* DWARF is produced early attribute may be added before > + ipa_visibility pass which adds linkage suffixes. */ > + if (flag_unique_internal_linkage_names > + && TREE_CODE (decl) == FUNCTION_DECL > + && !DECL_PRESERVE_P (decl) > + && !TREE_PUBLIC (decl) > + && !unique_linkage_name_suffix_start (name)) > + name = name_to_free = decl_unique_linkage_name (decl); > > /* Mimic what assemble_name_raw does with a leading '*'. */ > if (name[0] == '*') > @@ -22174,6 +22185,8 @@ add_linkage_attr (dw_die_ref die, tree decl) > add_AT_string (die, DW_AT_linkage_name, name); > else > add_AT_string (die, DW_AT_MIPS_linkage_name, name); > + if (name_to_free) > + free (name_to_free); > } > > /* Add source coordinate attributes for the given decl. */ > @@ -22219,7 +22232,12 @@ add_linkage_name (dw_die_ref die, tree decl) > { > if (debug_info_level > DINFO_LEVEL_NONE > && VAR_OR_FUNCTION_DECL_P (decl) > - && TREE_PUBLIC (decl) > + && (TREE_PUBLIC (decl) > + /* Add linkage name for auto-profile. create_gcov then > + will use these names that we can uniquely identify with > + symbols. */ > + || (flag_unique_internal_linkage_names > + && TREE_CODE (decl) == FUNCTION_DECL)) > && !(VAR_P (decl) && DECL_REGISTER (decl)) > && die->die_tag != DW_TAG_member) > add_linkage_name_raw (die, decl); > diff --git a/gcc/ipa-utils.h b/gcc/ipa-utils.h > index f33d21e64ce..0c0e98efbfc 100644 > --- a/gcc/ipa-utils.h > +++ b/gcc/ipa-utils.h > @@ -291,4 +291,12 @@ lto_streaming_expected_p () > return (flag_wpa || flag_incremental_link == INCREMENTAL_LINK_LTO); > } > > +/* Length of unique linkage suffixes. > + ".__uniq." is 8 characters and md5 hash as decimal number is 39. */ > + > +constexpr size_t decl_unique_linkage_name_suffix_len = 8 + 39; > +extern const char *decl_unique_linkage_name_suffix (); > +extern char *decl_unique_linkage_name (tree decl); > +extern const char *unique_linkage_name_suffix_start (const char *); > + > #endif /* GCC_IPA_UTILS_H */ > diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc > index 8097a03e240..58389afa837 100644 > --- a/gcc/ipa-visibility.cc > +++ b/gcc/ipa-visibility.cc > @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. If not see > > #include "config.h" > #include "system.h" > +#include "md5.h" > #include "coretypes.h" > #include "tm.h" > #include "function.h" > @@ -336,6 +337,80 @@ varpool_node::externally_visible_p (void) > return false; > } > > +/* Compute unique linkage suffix of DECL to STR. */ > + > +const char * > +decl_unique_linkage_name_suffix () > +{ > + static char str[decl_unique_linkage_name_suffix_len + 1]; > + if (str[0]) > + return str; > + char *to_hash = xstrdup (aux_base_name); > + if (endswith (to_hash, ".gk")) > + to_hash[strlen (to_hash) - 3] = '\0'; > + unsigned char checksum[16]; > + struct md5_ctx ctx; > + md5_init_ctx (&ctx); > + md5_process_bytes (to_hash, strlen (to_hash), &ctx); > + md5_finish_ctx (&ctx, checksum); > + FIXED_WIDE_INT (128) tmp = 0; > + for (int i = 15; i >= 0; i--) > + tmp = (tmp << 8) + checksum[i]; > + memcpy (str, ".__uniq.", 8); > + /* Use decimal encoding of md5 sum to so demanglers treat it is > + a clone index. */ > + for (int i = 38; i >= 0; i--) > + { > + FIXED_WIDE_INT (128) digit, ten = HOST_WIDE_INT_UC (10); > + tmp = wi::divmod_trunc (tmp, ten, UNSIGNED, &digit); > + str[i + 8] = '0' + (digit.to_uhwi ()); > + } > + str[39 + 8] = 0; > + free (to_hash); > + return str; > +} > + > +/* Compute unique name of decl in the form of > + assembler_name.uniq_<md5sum> > + Caller is responsible for releasing the string. */ > + > +char * > +decl_unique_linkage_name (tree decl) > +{ > + gcc_checking_assert (!TREE_PUBLIC (decl)); > + const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); > + int len = IDENTIFIER_LENGTH (DECL_ASSEMBLER_NAME (decl)); > + char *unique_name = XNEWVEC (char, len + > decl_unique_linkage_name_suffix_len + 1); > + memcpy (unique_name, name, len); > + memcpy (unique_name + len, decl_unique_linkage_name_suffix (), > + decl_unique_linkage_name_suffix_len + 1); > + return unique_name; > +} > + > +/* If NAME has unique linkage suffix return its start > + otherwise return NULL. */ > + > +const char * > +unique_linkage_name_suffix_start (const char *name) > +{ > + const char *c = strstr (name, ".__uniq."); > + if (!c) > + return 0; > + for (int i = 0; i < 39; i++) > + if (c[i+8] < '0' || c[i+8] > '9') > + { > + fprintf (stderr, "bad %s\n", name); > + return NULL; > + } > + if (c[decl_unique_linkage_name_suffix_len] > + && c[decl_unique_linkage_name_suffix_len] != '.') > + { > + fprintf (stderr, "bad2 %s %c\n", name, > c[decl_unique_linkage_name_suffix_len]); > + return NULL; > + } > + return c; > +} > + > /* Return true if reference to NODE can be replaced by a local alias. > Local aliases save dynamic linking overhead and enable more optimizations. > */ > @@ -752,6 +827,19 @@ function_and_variable_visibility (bool whole_program) > if (DECL_EXTERNAL (decl_node->decl)) > DECL_EXTERNAL (node->decl) = 1; > } > + if (flag_unique_internal_linkage_names > + && !flag_wpa > + && !in_lto_p > + && TREE_CODE (node->decl) == FUNCTION_DECL > + && !DECL_PRESERVE_P (node->decl) > + && !TREE_PUBLIC (node->decl) > + && !unique_linkage_name_suffix_start > + (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)))) > + { > + char *unique_name = decl_unique_linkage_name (node->decl); > + symtab->change_decl_assembler_name (node->decl, get_identifier > (unique_name)); > + free (unique_name); > + } > > update_visibility_by_resolution_info (node); > if (node->weakref)