On Wed, Nov 6, 2024 at 3:36 PM Michal Jires <mji...@suse.cz> wrote: > > During Incremental LTO, contents of LTO partitions diverge because of > external DIE references (DW_AT_abstract_origin). > > External references are in form 'die_symbol+offset'. > Originally there is only single die_symbol for each compilation unit and > its offsets are in 100'000s, which easily diverge. > > Die symbols have to be unique across compilation units. Originally for > this purpose the die symbol name is computed from hash of entire file. > To avoid this I added flag_lto_debuginfo_assume_unique_filepaths > which computes the die_symbol only from filepath, which seems reasonable > assumption for any project using incremental LTO. > Compilation unit's die symbol name is then prepended to each die symbol > for uniqueness. > > To remove divergence of offsets in case of C++, we have to add die > symbols to DW_TAG_subprogram (functions), DW_TAG_variable and > DW_TAG_namespace.
I wonder why you could not always do this for a subset of symbols, namely those exported from the current TU and building a symbol based on the symbols assembler name? That is, I dislike relying on a new flag_lto_debuginfo_assume_unique_filepaths flag. I'd also really like to see a way to get rid of those symbols at link time :/ Or at least make them smaller? For example by hashing the assembler name? The BFD linker has .gnu_lto_* special-casing for sections to discard, maybe we can add a special .note section, .note.gnu.discard_syms with a list of symbols to discard after link editing? > Benefits: > Before this patch Incremental LTO diverges/recompiles ~twice as much > with '-g'. With this, additional divergence with '-g' is under 10 %. > > Negatives: > When the flag is set, the added die symbols survive into final > executable. For `cc1` executable this represents almost 10 % size > increase of only added symbols. > You can strip them out, but I have not found a simple way to remove them > automatically in GCC. > However for the purposes of Incremental LTO it should suffice. There was > no measured compilation time increase because of streaming these > additional symbols/strings. I fail to see how this helps without adjusting dwarf2out_die_ref_for_decl? Richard. > gcc/ChangeLog: > > * common.opt: New flag. > * dwarf2out.cc (compute_comp_unit_symbol): > With flag, don't checksum contents but filepath. > (compute_die_symbols_from_die): New. > (compute_die_symbols): New. > (dwarf2out_early_finish): Call compute_die_symbols. > > gcc/testsuite/ChangeLog: > > * g++.dg/lto/die_symbol_conflicts_0.C: New test. > --- > gcc/common.opt | 4 + > gcc/dwarf2out.cc | 120 +++++++++++++++++- > .../g++.dg/lto/die_symbol_conflicts_0.C | 12 ++ > 3 files changed, 132 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C > > diff --git a/gcc/common.opt b/gcc/common.opt > index 12b25ff486d..4aa80f0df8f 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2253,6 +2253,10 @@ flto-partition= > Common Joined RejectNegative Enum(lto_partition_model) > Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED) > Specify the algorithm to partition symbols and vars at linktime. > > +flto-debuginfo-assume-unique-filepaths > +Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0) > +Assume all linked source files have unique filepaths. > + > ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h. > flto-compression-level= > Common Joined RejectNegative UInteger Var(flag_lto_compression_level) > Init(-1) IntegerRange(0, 19) > diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc > index bf1ac45ed73..af272a3a824 100644 > --- a/gcc/dwarf2out.cc > +++ b/gcc/dwarf2out.cc > @@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die) > the name filename of the unit. */ > > md5_init_ctx (&ctx); > - mark = 0; > - die_checksum (unit_die, &ctx, &mark); > - unmark_all_dies (unit_die); > + if (flag_lto_debuginfo_assume_unique_filepaths) > + { > + gcc_assert (die_name); > + md5_process_bytes (die_name, strlen (die_name), &ctx); > + } > + else > + { > + mark = 0; > + die_checksum (unit_die, &ctx, &mark); > + unmark_all_dies (unit_die); > + } > md5_finish_ctx (&ctx, checksum); > > /* When we this for comp_unit_die () we have a DW_AT_name that might > @@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die) > FOR_EACH_CHILD (die, c, ctf_do_die (c)); > } > > +/* Recursively compute die symbols from DIE's attributes. > + Not all symbols can be computed this way. */ > +static void > +compute_die_symbols_from_die (dw_die_ref die) > +{ > + dw_attr_node *a; > + int i; > + const char* name = NULL; > + > + if (!die->die_attr) > + return; > + > + switch (die->die_tag) > + { > + /* Assumed that each die parent has at most single children namespace > + with the same name. */ > + case DW_TAG_namespace: > + case DW_TAG_module: > + > + FOR_EACH_VEC_ELT (*die->die_attr, i, a) > + { > + if (a->dw_attr == DW_AT_name) > + name = AT_string (a); > + /* Ignored DW_AT_abstract_origin, leads to duplicates. */ > + } > + break; > + > + default: break; > + } > + > + if (name) > + { > + gcc_assert (!die->die_id.die_symbol); > + gcc_assert (die->die_parent); > + > + const char* parent_symbol = die->die_parent->die_id.die_symbol; > + /* Prefix with parent symbol to guarantee uniqueness. Important for > + namespaces. Toplevel functions and variables can and do use just > + comp_unit's symbol as prefix. Die symbols of these toplevel symbols > + may overlap. Use the 'r' to differentiate. */ > + die->die_id.die_symbol = concat (parent_symbol, ".r.", name, NULL); > + } > + > + /* Splitting functions has little to no benefit. */ > + if (die->die_tag == DW_TAG_subprogram) > + return; > + > + dw_die_ref c; > + if (!die->comdat_type_p && die->die_id.die_symbol) > + FOR_EACH_CHILD (die, c, compute_die_symbols_from_die (c)); > +} > + > +/* Compute die symbols and insert them into their DIEs. > + All die symbols must be unique across compilation units. */ > +static void > +compute_die_symbols () > +{ > + dw_die_ref comp_unit = comp_unit_die (); > + compute_comp_unit_symbol (comp_unit); > + > + /* For Incremental LTO we are interested in stable 'die_symbol+offset'. > + By computing and emmiting die_symbols for more DIEs we get more local > + and stable offsets. */ > + if (flag_generate_lto && flag_lto_debuginfo_assume_unique_filepaths) > + { > + /* Information about functions is not fully contained in DIEs. Must be > + done separatelly. > + DIE might not contain DW_AT_linkage_name and DW_AT_name may be not > + unique. Even when contained, DW_AT_linkage_name does not have to be > + unique. */ > + const char* base = comp_unit->die_id.die_symbol; > + symtab_node *node; > + FOR_EACH_SYMBOL (node) > + { > + /* There may be multiple functions with the same asm_name. > + This is rare, so we don't handle them. */ > + if (node->next_sharing_asm_name || node->previous_sharing_asm_name) > + continue; > + > + const char* asm_name = node->asm_name (); > + /* Mimic what assemble_name_raw does with a leading '*'. */ > + if (asm_name[0] == '*') > + asm_name = &asm_name[1]; > + > + dw_die_ref decl_die = lookup_decl_die (node->decl); > + if (decl_die) > + decl_die->die_id.die_symbol = concat (base, ".s.", asm_name, > NULL); > + > + tree origin = DECL_ABSTRACT_ORIGIN (node->decl); > + dw_die_ref origin_die = origin ? lookup_decl_die (origin) : NULL; > + if (origin_die) > + { > + asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (origin)); > + origin_die->die_id.die_symbol = concat (base, ".s.", asm_name, > + NULL); > + } > + } > + > + /* Other needed die symbols can be computed from DIE informatation > + alone. */ > + compute_die_symbols_from_die (comp_unit); > + } > +} > + > /* Perform any cleanups needed after the early debug generation pass > has run. */ > > @@ -33338,7 +33450,7 @@ dwarf2out_early_finish (const char *filename) > } > > /* Stick a unique symbol to the main debuginfo section. */ > - compute_comp_unit_symbol (comp_unit_die ()); > + compute_die_symbols (); > > /* Output the main compilation unit. We always need it if only for > the CU symbol. */ > diff --git a/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C > b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C > new file mode 100644 > index 00000000000..4979b4350f4 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C > @@ -0,0 +1,12 @@ > +/* { dg-lto-do assemble } */ > +/* { dg-lto-options { { -O2 -g -flto -flto-debuginfo-assume-unique-filepaths > } } } */ > + > +/* DIE symbols computed from names might conflict. */ > + > +namespace foo { > + void foo () asm ("foo"); > +} > + > +int main () { > + foo::foo (); > +} > -- > 2.47.0 >