On Wed, Nov 6, 2024 at 3:36 PM Michal Jires <mji...@suse.cz> wrote:
>
> During Incremental LTO, contents of LTO partitions diverge because of
> external DIE references (DW_AT_abstract_origin).
>
> External references are in form 'die_symbol+offset'.
> Originally there is only single die_symbol for each compilation unit and
> its offsets are in 100'000s, which easily diverge.
>
> Die symbols have to be unique across compilation units. Originally for
> this purpose the die symbol name is computed from hash of entire file.
> To avoid this I added flag_lto_debuginfo_assume_unique_filepaths
> which computes the die_symbol only from filepath, which seems reasonable
> assumption for any project using incremental LTO.
> Compilation unit's die symbol name is then prepended to each die symbol
> for uniqueness.
>
> To remove divergence of offsets in case of C++, we have to add die
> symbols to DW_TAG_subprogram (functions), DW_TAG_variable and
> DW_TAG_namespace.

I wonder why you could not always do this for a subset of symbols,
namely those exported from the current TU and building a symbol
based on the symbols assembler name?

That is, I dislike relying on a new flag_lto_debuginfo_assume_unique_filepaths
flag.

I'd also really like to see a way to get rid of those symbols at link time :/
Or at least make them smaller?  For example by hashing the assembler
name?  The BFD linker has .gnu_lto_* special-casing for sections to discard,
maybe we can add a special .note section, .note.gnu.discard_syms with
a list of symbols to discard after link editing?

> Benefits:
> Before this patch Incremental LTO diverges/recompiles ~twice as much
> with '-g'. With this, additional divergence with '-g' is under 10 %.
>
> Negatives:
> When the flag is set, the added die symbols survive into final
> executable. For `cc1` executable this represents almost 10 % size
> increase of only added symbols.
> You can strip them out, but I have not found a simple way to remove them
> automatically in GCC.
> However for the purposes of Incremental LTO it should suffice. There was
> no measured compilation time increase because of streaming these
> additional symbols/strings.

I fail to see how this helps without adjusting dwarf2out_die_ref_for_decl?

Richard.

> gcc/ChangeLog:
>
>         * common.opt: New flag.
>         * dwarf2out.cc (compute_comp_unit_symbol):
>           With flag, don't checksum contents but filepath.
>         (compute_die_symbols_from_die): New.
>         (compute_die_symbols): New.
>         (dwarf2out_early_finish): Call compute_die_symbols.
>
> gcc/testsuite/ChangeLog:
>
>         * g++.dg/lto/die_symbol_conflicts_0.C: New test.
> ---
>  gcc/common.opt                                |   4 +
>  gcc/dwarf2out.cc                              | 120 +++++++++++++++++-
>  .../g++.dg/lto/die_symbol_conflicts_0.C       |  12 ++
>  3 files changed, 132 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 12b25ff486d..4aa80f0df8f 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2253,6 +2253,10 @@ flto-partition=
>  Common Joined RejectNegative Enum(lto_partition_model) 
> Var(flag_lto_partition) Init(LTO_PARTITION_BALANCED)
>  Specify the algorithm to partition symbols and vars at linktime.
>
> +flto-debuginfo-assume-unique-filepaths
> +Common Var(flag_lto_debuginfo_assume_unique_filepaths) Init(0)
> +Assume all linked source files have unique filepaths.
> +
>  ; The initial value of -1 comes from Z_DEFAULT_COMPRESSION in zlib.h.
>  flto-compression-level=
>  Common Joined RejectNegative UInteger Var(flag_lto_compression_level) 
> Init(-1) IntegerRange(0, 19)
> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index bf1ac45ed73..af272a3a824 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -8015,9 +8015,17 @@ compute_comp_unit_symbol (dw_die_ref unit_die)
>       the name filename of the unit.  */
>
>    md5_init_ctx (&ctx);
> -  mark = 0;
> -  die_checksum (unit_die, &ctx, &mark);
> -  unmark_all_dies (unit_die);
> +  if (flag_lto_debuginfo_assume_unique_filepaths)
> +    {
> +      gcc_assert (die_name);
> +      md5_process_bytes (die_name, strlen (die_name), &ctx);
> +    }
> +  else
> +    {
> +      mark = 0;
> +      die_checksum (unit_die, &ctx, &mark);
> +      unmark_all_dies (unit_die);
> +    }
>    md5_finish_ctx (&ctx, checksum);
>
>    /* When we this for comp_unit_die () we have a DW_AT_name that might
> @@ -33119,6 +33127,110 @@ ctf_debug_do_cu (dw_die_ref die)
>    FOR_EACH_CHILD (die, c, ctf_do_die (c));
>  }
>
> +/* Recursively compute die symbols from DIE's attributes.
> +   Not all symbols can be computed this way.  */
> +static void
> +compute_die_symbols_from_die (dw_die_ref die)
> +{
> +  dw_attr_node *a;
> +  int i;
> +  const char* name = NULL;
> +
> +  if (!die->die_attr)
> +    return;
> +
> +  switch (die->die_tag)
> +    {
> +      /* Assumed that each die parent has at most single children namespace
> +        with the same name.  */
> +      case DW_TAG_namespace:
> +      case DW_TAG_module:
> +
> +       FOR_EACH_VEC_ELT (*die->die_attr, i, a)
> +         {
> +           if (a->dw_attr == DW_AT_name)
> +             name = AT_string (a);
> +           /* Ignored DW_AT_abstract_origin, leads to duplicates.  */
> +         }
> +       break;
> +
> +      default: break;
> +    }
> +
> +  if (name)
> +    {
> +      gcc_assert (!die->die_id.die_symbol);
> +      gcc_assert (die->die_parent);
> +
> +      const char* parent_symbol = die->die_parent->die_id.die_symbol;
> +      /* Prefix with parent symbol to guarantee uniqueness.  Important for
> +        namespaces.  Toplevel functions and variables can and do use just
> +        comp_unit's symbol as prefix.  Die symbols of these toplevel symbols
> +        may overlap.  Use the 'r' to differentiate.  */
> +      die->die_id.die_symbol = concat (parent_symbol, ".r.", name, NULL);
> +    }
> +
> +  /* Splitting functions has little to no benefit.  */
> +  if (die->die_tag == DW_TAG_subprogram)
> +    return;
> +
> +  dw_die_ref c;
> +  if (!die->comdat_type_p && die->die_id.die_symbol)
> +    FOR_EACH_CHILD (die, c, compute_die_symbols_from_die (c));
> +}
> +
> +/* Compute die symbols and insert them into their DIEs.
> +   All die symbols must be unique across compilation units.  */
> +static void
> +compute_die_symbols ()
> +{
> +  dw_die_ref comp_unit = comp_unit_die ();
> +  compute_comp_unit_symbol (comp_unit);
> +
> +  /* For Incremental LTO we are interested in stable 'die_symbol+offset'.
> +     By computing and emmiting die_symbols for more DIEs we get more local
> +     and stable offsets.  */
> +  if (flag_generate_lto && flag_lto_debuginfo_assume_unique_filepaths)
> +    {
> +      /* Information about functions is not fully contained in DIEs.  Must be
> +        done separatelly.
> +        DIE might not contain DW_AT_linkage_name and DW_AT_name may be not
> +        unique.  Even when contained, DW_AT_linkage_name does not have to be
> +        unique.  */
> +      const char* base = comp_unit->die_id.die_symbol;
> +      symtab_node *node;
> +      FOR_EACH_SYMBOL (node)
> +       {
> +         /* There may be multiple functions with the same asm_name.
> +            This is rare, so we don't handle them.  */
> +         if (node->next_sharing_asm_name || node->previous_sharing_asm_name)
> +             continue;
> +
> +         const char* asm_name = node->asm_name ();
> +         /* Mimic what assemble_name_raw does with a leading '*'.  */
> +         if (asm_name[0] == '*')
> +           asm_name = &asm_name[1];
> +
> +         dw_die_ref decl_die = lookup_decl_die (node->decl);
> +         if (decl_die)
> +           decl_die->die_id.die_symbol = concat (base, ".s.", asm_name, 
> NULL);
> +
> +         tree origin = DECL_ABSTRACT_ORIGIN (node->decl);
> +         dw_die_ref origin_die = origin ? lookup_decl_die (origin) : NULL;
> +         if (origin_die)
> +           {
> +             asm_name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (origin));
> +             origin_die->die_id.die_symbol = concat (base, ".s.", asm_name,
> +                                                     NULL);
> +           }
> +       }
> +
> +       /* Other needed die symbols can be computed from DIE informatation
> +          alone.  */
> +       compute_die_symbols_from_die (comp_unit);
> +    }
> +}
> +
>  /* Perform any cleanups needed after the early debug generation pass
>     has run.  */
>
> @@ -33338,7 +33450,7 @@ dwarf2out_early_finish (const char *filename)
>      }
>
>    /* Stick a unique symbol to the main debuginfo section.  */
> -  compute_comp_unit_symbol (comp_unit_die ());
> +  compute_die_symbols ();
>
>    /* Output the main compilation unit.  We always need it if only for
>       the CU symbol.  */
> diff --git a/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C 
> b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
> new file mode 100644
> index 00000000000..4979b4350f4
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/lto/die_symbol_conflicts_0.C
> @@ -0,0 +1,12 @@
> +/* { dg-lto-do assemble } */
> +/* { dg-lto-options { { -O2 -g -flto -flto-debuginfo-assume-unique-filepaths 
> } } } */
> +
> +/* DIE symbols computed from names might conflict.  */
> +
> +namespace foo {
> +  void foo () asm ("foo");
> +}
> +
> +int main () {
> +    foo::foo ();
> +}
> --
> 2.47.0
>

Reply via email to