On Wed, Nov 6, 2024 at 3:36 PM Michal Jires <[email protected]> wrote:
>
> During Incremental LTO, contents of LTO partitions diverge because of
> external DIE references (DW_AT_abstract_origin).
>
> External references are in form 'die_symbol+offset'.
> Originally there is only single die_symbol for each compilation unit and
> its offsets are in 100'000s, which easily diverge.
>
> Die symbols have to be unique across compilation units. Originally for
> this purpose the die symbol name is computed from hash of entire file.
> To avoid this I added flag_lto_debuginfo_assume_unique_filepaths
> which computes the die_symbol only from filepath, which seems reasonable
> assumption for any project using incremental LTO.
> Compilation unit's die symbol name is then prepended to each die symbol
> for uniqueness.
>
> To remove divergence of offsets in case of C++, we have to add die
> symbols to DW_TAG_subprogram (functions), DW_TAG_variable and
> DW_TAG_namespace.
I wonder why you could not always do this for a subset of symbols,
namely those exported from the current TU and building a symbol
based on the symbols assembler name?
That is, I dislike relying on a new flag_lto_debuginfo_assume_unique_filepaths
flag.
I'd also really like to see a way to get rid of those symbols at link time :/
Or at least make them smaller? For example by hashing the assembler
name? The BFD linker has .gnu_lto_* special-casing for sections to discard,
maybe we can add a special .note section, .note.gnu.discard_syms with
a list of symbols to discard after link editing?
> Benefits:
> Before this patch Incremental LTO diverges/recompiles ~twice as much
> with '-g'. With this, additional divergence with '-g' is under 10 %.
>
> Negatives:
> When the flag is set, the added die symbols survive into final
> executable. For `cc1` executable this represents almost 10 % size
> increase of only added symbols.
> You can strip them out, but I have not found a simple way to remove them
> automatically in GCC.
> However for the purposes of Incremental LTO it should suffice. There was
> no measured compilation time increase because of streaming these
> additional symbols/strings.
I fail to see how this helps without adjusting dwarf2out_die_ref_for_decl?
Richard.
> gcc/ChangeLog:
>
> * common.opt: New flag.
> * dwarf2out.cc (compute_comp_unit_symbol):
> With flag, don't checksum contents but filepath.
> (compute_die_symbols_from_die): New.
> (compute_die_symbols): New.
> (dwarf2out_early_finish): Call compute_die_symbols.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/lto/die_symbol_conflicts_0.C: New test.
> ---
> gcc/common.opt | 4 +
> gcc/dwarf2out.cc | 120 +++++++++++++++++-
> .../g++.dg/lto/die_symbol_conflicts_0.C | 12 ++
> 3 files changed, 132 insertions(+), 4 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 12b25ff486d..4aa80f0df8f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2253,6 +2253,10 @@ flto-partition=
> Common Joined RejectNegative Enum(lto_partition_model)
> Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED)
> Specify the algorithm to partition symbols and vars at linktime.
>
> +flto-debuginfo-assume-unique-filepaths
> +Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0)
> +Assume all linked source files have unique filepaths.
> +
> ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
> flto-compression-level=
> Common Joined RejectNegative UInteger Var(flag_lto_compression_level)
> Init(-1) IntegerRange(0, 19)
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index bf1ac45ed73..af272a3a824 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die)
> the name filename of the unit. */
>
> md5_init_ctx (&ctx);
> - mark = 0;
> - die_checksum (unit_die, &ctx, &mark);
> - unmark_all_dies (unit_die);
> + if (flag_lto_debuginfo_assume_unique_filepaths)
> + {
> + gcc_assert (die_name);
> + md5_process_bytes (die_name, strlen (die_name), &ctx);
> + }
> + else
> + {
> + mark = 0;
> + die_checksum (unit_die, &ctx, &mark);
> + unmark_all_dies (unit_die);
> + }
> md5_finish_ctx (&ctx, checksum);
>
> /* When we this for comp_unit_die () we have a DW_AT_name that might
> @@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die)
> FOR_EACH_CHILD (die, c, ctf_do_die (c));
> }
>
> +/* Recursively compute die symbols from DIE's attributes.
> + Not all symbols can be computed this way. */
> +static void
> +compute_die_symbols_from_die (dw_die_ref die)
> +{
> + dw_attr_node *a;
> + int i;
> + const char* name = NULL;
> +
> + if (!die->die_attr)
> + return;
> +
> + switch (die->die_tag)
> + {
> + /* Assumed that each die parent has at most single children namespace
> + with the same name. */
> + case DW_TAG_namespace:
> + case DW_TAG_module:
> +
> + FOR_EACH_VEC_ELT (*die->die_attr, i, a)
> + {
> + if (a->dw_attr == DW_AT_name)
> + name = AT_string (a);
> + /* Ignored DW_AT_abstract_origin, leads to duplicates. */
> + }
> + break;
> +
> + default: break;
> + }
> +
> + if (name)
> + {
> + gcc_assert (!die->die_id.die_symbol);
> + gcc_assert (die->die_parent);
> +
> + const char* parent_symbol = die->die_parent->die_id.die_symbol;
> + /* Prefix with parent symbol to guarantee uniqueness. Important for
> + namespaces. Toplevel functions and variables can and do use just
> + comp_unit's symbol as prefix. Die symbols of these toplevel symbols
> + may overlap. Use the 'r' to differentiate. */
> + die->die_id.die_symbol = concat (parent_symbol, ".r.", name, NULL);
> + }
> +
> + /* Splitting functions has little to no benefit. */
> + if (die->die_tag == DW_TAG_subprogram)
> + return;
> +
> + dw_die_ref c;
> + if (!die->comdat_type_p && die->die_id.die_symbol)
> + FOR_EACH_CHILD (die, c, compute_die_symbols_from_die (c));
> +}
> +
> +/* Compute die symbols and insert them into their DIEs.
> + All die symbols must be unique across compilation units. */
> +static void
> +compute_die_symbols ()
> +{
> + dw_die_ref comp_unit = comp_unit_die ();
> + compute_comp_unit_symbol (comp_unit);
> +
> + /* For Incremental LTO we are interested in stable 'die_symbol+offset'.
> + By computing and emmiting die_symbols for more DIEs we get more local
> + and stable offsets. */
> + if (flag_generate_lto && flag_lto_debuginfo_assume_unique_filepaths)
> + {
> + /* Information about functions is not fully contained in DIEs. Must be
> + done separatelly.
> + DIE might not contain DW_AT_linkage_name and DW_AT_name may be not
> + unique. Even when contained, DW_AT_linkage_name does not have to be
> + unique. */
> + const char* base = comp_unit->die_id.die_symbol;
> + symtab_node *node;
> + FOR_EACH_SYMBOL (node)
> + {
> + /* There may be multiple functions with the same asm_name.
> + This is rare, so we don't handle them. */
> + if (node->next_sharing_asm_name || node->previous_sharing_asm_name)
> + continue;
> +
> + const char* asm_name = node->asm_name ();
> + /* Mimic what assemble_name_raw does with a leading '*'. */
> + if (asm_name[0] == '*')
> + asm_name = &asm_name[1];
> +
> + dw_die_ref decl_die = lookup_decl_die (node->decl);
> + if (decl_die)
> + decl_die->die_id.die_symbol = concat (base, ".s.", asm_name,
> NULL);
> +
> + tree origin = DECL_ABSTRACT_ORIGIN (node->decl);
> + dw_die_ref origin_die = origin ? lookup_decl_die (origin) : NULL;
> + if (origin_die)
> + {
> + asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (origin));
> + origin_die->die_id.die_symbol = concat (base, ".s.", asm_name,
> + NULL);
> + }
> + }
> +
> + /* Other needed die symbols can be computed from DIE informatation
> + alone. */
> + compute_die_symbols_from_die (comp_unit);
> + }
> +}
> +
> /* Perform any cleanups needed after the early debug generation pass
> has run. */
>
> @@ -33338,7 +33450,7 @@ dwarf2out_early_finish (const char *filename)
> }
>
> /* Stick a unique symbol to the main debuginfo section. */
> - compute_comp_unit_symbol (comp_unit_die ());
> + compute_die_symbols ();
>
> /* Output the main compilation unit. We always need it if only for
> the CU symbol. */
> diff --git a/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
> b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
> new file mode 100644
> index 00000000000..4979b4350f4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
> @@ -0,0 +1,12 @@
> +/* { dg-lto-do assemble } */
> +/* { dg-lto-options { { -O2 -g -flto -flto-debuginfo-assume-unique-filepaths
> } } } */
> +
> +/* DIE symbols computed from names might conflict. */
> +
> +namespace foo {
> + void foo () asm ("foo");
> +}
> +
> +int main () {
> + foo::foo ();
> +}
> --
> 2.47.0
>