https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87362
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- So I tried debugging using LTO bootstrapped cc1. profiling gdb for a simple gdb ./cc1 (gdb) b do_rpo_vn (gdb) q yields Samples: 2K of event 'instructions', Event count (approx.): 45695722362 Overhead Command Shared Object Symbol 8.32% gdb gdb [.] read_attribute_value 5.78% gdb gdb [.] dwarf2_attr 5.10% gdb gdb [.] load_partial_dies 4.23% gdb gdb [.] cp_find_first_component_aux 4.10% gdb gdb [.] partial_die_info::read 3.55% gdb gdb [.] htab_find_slot_with_hash 3.11% gdb gdb [.] get_objfile_arch 2.98% gdb gdb [.] peek_die_abbrev 2.88% gdb gdb [.] cp_canonicalize_string or with a callgraph Samples: 1K of event 'instructions', Event count (approx.): 37206022209 Children Self Command Shared Object Symbol ◆ + 91.92% 0.00% gdb gdb [.] gdb_main ▒ + 91.47% 0.00% gdb gdb [.] main ▒ + 91.42% 0.00% gdb libc-2.22.so [.] __libc_start_main ▒ + 91.35% 0.00% gdb gdb [.] catch_command_errors ▒ + 91.30% 0.00% gdb gdb [.] _start ▒ + 85.40% 0.00% gdb gdb [.] symbol_file_add_main_ad▒ + 85.40% 0.00% gdb gdb [.] symbol_file_add_main ▒ + 55.17% 0.00% gdb gdb [.] psym_lookup_symbol ▒ + 55.13% 0.00% gdb gdb [.] psymtab_to_symtab ▒ + 55.13% 0.00% gdb gdb [.] dwarf2_read_symtab ▒ + 55.13% 0.00% gdb gdb [.] dw2_do_instantiate_symt▒ + 55.06% 0.00% gdb gdb [.] lookup_symbol_in_objfil▒ + 55.02% 0.00% gdb gdb [.] lookup_global_symbol ▒ + 55.02% 0.00% gdb gdb [.] default_iterate_over_ob▒ + 55.02% 0.00% gdb gdb [.] lookup_symbol_global_it▒ + 55.00% 0.00% gdb gdb [.] lookup_symbol_aux ▒ + 54.99% 0.00% gdb gdb [.] basic_lookup_symbol_non▒ + 54.94% 0.00% gdb gdb [.] lookup_symbol_in_langua▒ + 54.83% 0.00% gdb gdb [.] lookup_symbol ▒ + 54.77% 0.00% gdb gdb [.] set_initial_language ▒ + 43.75% 0.49% gdb gdb [.] process_die but that doesn't look too useful. Note that startup / breakpointing isn't as fast as non-LTOed cc1 but it's still usable. I notice that while .debug_ranges is quite large the .debug_aranges section is small. I wonder through what hoops gdb needs to go to get at the entry address for main() - I can imagine that because the late LTO debug only contains the ranges attribute but not DW_AT_name gdb has to follow all LTO debug DIE abstract origins. Since those abstract origins are in DW_TAG_imported_unit imported CUs it may (hopefully lazily!) need to parse those when an abstract origin refers to a DIE within them. At least I don't see sth like a "symbol table" refering to the late LTO DIEs in DWARF. Maybe if we're lucky and main() is the very first DIE we run into startup would be faster. Of course looking at the startup / breakpoint differences between LTO and non-LTO might yield to a better understanding of things here. For example it might be possible to optimize the poking at DW_AT_name via an abstract origin _without_ needing to pull in all of the imported unit if it's from such kind of searching. When using callgrind it seems that the whole complication comes in via symbol_file_add_main -> ... -> read_symbols -> ... -> read_psyms -> dwarf2_build_psymtabs as expected. So somehow avoiding to pull in all the early LTO CUs would be the thing to do(?) - maybe we can add DW_AT_linkage_name to the late generated DIEs to help gdb (we seem to not do that). In fact we seem to add them to the early DIEs (probably needed for TYPE_DECLs). I'm trying a hack like Index: gcc/dwarf2out.c =================================================================== --- gcc/dwarf2out.c (revision 264418) +++ gcc/dwarf2out.c (working copy) @@ -6018,6 +6018,9 @@ dwarf2out_register_external_die (tree de break; case FUNCTION_DECL: die = new_die (DW_TAG_subprogram, parent, decl); + /* This helps debuggers to build a symbol table. */ + if (! flag_wpa && flag_incremental_link != INCREMENTAL_LINK_LTO) + add_linkage_name (die, decl); break; case VAR_DECL: die = new_die (DW_TAG_variable, parent, decl);