Hi, to add some extra data > At the summit, I discovered two things about the internal representation > of debugging information: > > 1) According to Honza, the instances of the BLOCK tree type take 30% of > the space in a compilation.
this large portion appears on C++ testcases doing a lot of inlining (tramp3d, boost, gerald's testcase). The reason is obvious - we have several function calls for every instruction output. Since early inlining is good on killing the instructions early, we end up with many blocks (there are several blocks constructed for every function inlined) and they are never pruned out. I have patch that removes empty block and uses now unused ignore_block debug hook (this usage probably went away by my rewrite of blocks handling in cfglayout many years ago and no one noticed problem - so either ignore_block is currently overly conservative, or we are missing some later sanity check that all blocks needed are output). It also knows how to be more aggressive withtout -g. Without -g most of blocks comes away, with -g it is about 30-50% if I remember correctly. (I made it few months back when I first noticed the problem) I am attaching my work in progress patch (it ICEs in libjava compilation where it didn't last month so I need to fix it before sending for review) just in case someone is interested. > 2) The BLOCKS structure is linked in a way so that the blocks for one > function link to the blocks of other functions. > > These two facts conspire to create a big problem for GCC/LTO, especially > when we progress to trying to compile very large programs. Unlike many > other essential parts of gcc, the current representation of debugging > information is not one that can be divided into moderate sized pieces > that can be processed independently. > > Honza's last lto patch "solves" the problem in the very short term by > simply creating a single BLOCK for each function. This provides enough > information to allow our testing to continue, but means that no useful > debugging information will be generated. This is not acceptable in > even the medium term, but it allows the problem to be deferred while > Mark and I get the basics of reading and writing gimple working. > > I find it somewhat surprising that we need so many blocks. My > experience is that in real programs few blocks actually have any local > declarations and it appears that we do not bother to get rid of the > blocks that have no local decls. However the biggest problem for lto > is that when a procedure is inlined, the set of blocks for the inlined > function is copied and the new copies have a cross link to the original > version. It would help a lot if that pointer could be replaced with The particular problem here is that the abstract origin pointers points to the blocks within functions they was constructed from. These are used by dwarf2out to output abstract copy of the function and then use it as a destination of origin pointers from every copy of the function. (I am not sure what the block origins are neede for by GDB, explanation would be welcome) The functions pointed might be otherwise invisible to middle end (either never finalized by frotnend - such as in the case of templates where the ABSTRACT_ORIGINs point to the C++ representation of uninstantiated template, or just declared dead by cgraph and removed before lowering) and thus also never saved to LTO files in a way we intend them to work. In debug output every function can also appear twice - once as abstract function that is output early via debug hook in the original form with all blocks where the abstract origins points (this seems partly broken, I will try to send a fix). Second time as real function such as when produced offline that can have already modified block structure (by inlining or by my little block removal pass). Somewhat slopy process is then matching the origin points (the ABSTACT_ORIGIN pointers points to now modified BLOCK structure, but dwarf2out data is already out in unmodified form). For LTO we probably need to find way to pickle such a abstract functions in addition to full repreasenation of functions that was partly processed by optimizations. > something like a pointer to a function decl and the dfs number of the > block in the original function. I do not know the semantics of what is > needed by the debuggers, but some representation where each function can > be managed as a separate unit is going to be required to process large > programs. I would like to ask here for some help too. It seems to me that it is just tip of the iceberg - most of the other debug info is going on side directly from frontend and we need way to read it back to LTO frontend somehow and get it through with correct modification when optimization happens... I am trying to make sense of current way debug info is handled but it is a bit chalenging. We are inconsistent in many ways - for example we output debug info on optimized out locals in some cases but not in other cases, we do some care to output info on optimized out static variables, but not static locals. We care to output debug info on optimized out inline functions, but not for functions that are not inline etc. I believe a lot of this inconsistency was actually brought in by my cgraph work, so I would like to handle this somehow :( It seems to me that cgraph/varool is generally not a good place to deal with debug info for this reasons. We probably need real symboltable that replace current wrapup_global_declarations process... Does someone have idea of the overall plan how debug info should work and what should be there? Honza > > Suggestions are welcome, but volunteers willing to attack this problem > are truly needed. I do not think that anyone would take lto seriously > if we cannot support debugging; only toy compilers do not have debugging. > > Kenny > > Index: gimple-low.c =================================================================== *** gimple-low.c (revision 124614) --- gimple-low.c (working copy) *************** lower_stmt (tree_stmt_iterator *tsi, str *** 210,216 **** { tree stmt = tsi_stmt (*tsi); ! if (EXPR_HAS_LOCATION (stmt) && data) TREE_BLOCK (stmt) = data->block; switch (TREE_CODE (stmt)) --- 210,218 ---- { tree stmt = tsi_stmt (*tsi); ! if (EXPR_HAS_LOCATION (stmt) && data ! && (debug_info_level == DINFO_LEVEL_NORMAL ! || debug_info_level == DINFO_LEVEL_VERBOSE)) TREE_BLOCK (stmt) = data->block; switch (TREE_CODE (stmt)) Index: tree-ssa-live.c =================================================================== *** tree-ssa-live.c (revision 124614) --- tree-ssa-live.c (working copy) *************** Boston, MA 02110-1301, USA. */ *** 30,35 **** --- 30,37 ---- #include "tree-dump.h" #include "tree-ssa-live.h" #include "toplev.h" + #include "debug.h" + #include "flags.h" #ifdef ENABLE_CHECKING static void verify_live_on_entry (tree_live_info_p); *************** mark_all_vars_used_1 (tree *tp, int *wal *** 405,413 **** --- 407,421 ---- void *data ATTRIBUTE_UNUSED) { tree t = *tp; + char const c = TREE_CODE_CLASS (TREE_CODE (t)); + tree b; if (TREE_CODE (t) == SSA_NAME) t = SSA_NAME_VAR (t); + if ((IS_EXPR_CODE_CLASS (c) + || IS_GIMPLE_STMT_CODE_CLASS (c)) + && (b = TREE_BLOCK (t)) != NULL) + TREE_USED (b) = true; /* Ignore TREE_ORIGINAL for TARGET_MEM_REFS, as well as other fields that do not contain vars. */ *************** mark_all_vars_used_1 (tree *tp, int *wal *** 431,436 **** --- 439,545 ---- return NULL; } + /* Mark the scope block SCOPE and is subblocks unused when they can be + possibly eliminated if dead. */ + + static void + mark_scope_block_unused (tree scope) + { + tree t; + TREE_USED (scope) = false; + if (!(*debug_hooks->ignore_block) (scope)) + TREE_USED (scope) = true; + for (t = BLOCK_SUBBLOCKS (scope); t ; t = BLOCK_CHAIN (t)) + mark_scope_block_unused (t); + } + + /* Look if the block is dead (by possibly elliminating it's dead subblocks) + and return true if so. + Block is declared dead if: + 1) No statements are associated with it. + 2) Declares no live variables + 3) All subblocks are dead + or there is precisely one subblocks and the block + has same abstract origin as outer block and declares + no variables, so it is pure wrapper. + When we are not outputting full debug info, we also elliminate dead variables + out of scope blocks to let them to be recycled by GGC and to save copying work + done by the inliner. + */ + + static bool + remove_unused_scope_block_p (tree scope) + { + tree *t, *next; + bool unused = !TREE_USED (scope); + var_ann_t ann; + int nsubblocks = 0; + + for (t = &BLOCK_VARS (scope); *t; t = next) + { + next = &TREE_CHAIN (*t); + + /* Debug info of nested function reffers to the block of the + function. */ + if (TREE_CODE (*t) == FUNCTION_DECL) + unused = false; + + /* When we are outputting debug info, we usually want to output + info about optimized-out variables in the scope blocks. + Exception are the scope blocks not containing any instructions + at all so user can't get into the scopes at first place. */ + else if ((ann = var_ann (*t)) != NULL + && ann->used) + unused = false; + + /* When we are not doing full debug info, we however can keep around + only the used variables for cfgexpand's memory packing saving quite + a lot of memory. */ + else if (debug_info_level != DINFO_LEVEL_NORMAL + && debug_info_level != DINFO_LEVEL_VERBOSE) + { + *t = TREE_CHAIN (*t); + next = t; + } + } + + for (t = &BLOCK_SUBBLOCKS (scope); *t ;) + if (remove_unused_scope_block_p (*t)) + { + if (BLOCK_SUBBLOCKS (*t)) + { + tree next = BLOCK_CHAIN (*t); + *t = BLOCK_SUBBLOCKS (*t); + BLOCK_CHAIN (*t) = next; + t = &BLOCK_CHAIN (*t); + } + else + *t = BLOCK_CHAIN (*t); + } + else + { + t = &BLOCK_CHAIN (*t); + nsubblocks ++; + } + /* Outer scope is always used. */ + if (!BLOCK_SUPERCONTEXT (scope) + || TREE_CODE (BLOCK_SUPERCONTEXT (scope)) == FUNCTION_DECL) + unused = false; + /* If there are more than one live subblocks, it is used. */ + else if (nsubblocks > 1) + unused = false; + /* When there is only one subblock, see if it is just wrapper we can + ignore. Wrappers are not declaring any variables and not changing + abstract origin. */ + else if (nsubblocks == 1 + && (BLOCK_VARS (scope) + || ((debug_info_level == DINFO_LEVEL_NORMAL + || debug_info_level == DINFO_LEVEL_VERBOSE) + && ((BLOCK_ABSTRACT_ORIGIN (scope) + != BLOCK_ABSTRACT_ORIGIN (BLOCK_SUPERCONTEXT (scope))))))) + unused = false; + return unused; + } /* Mark all VAR_DECLS under *EXPR_P as used, so that they won't be eliminated during the tree->rtl conversion process. */ *************** remove_unused_locals (void) *** 452,457 **** --- 561,567 ---- referenced_var_iterator rvi; var_ann_t ann; + mark_scope_block_unused (DECL_INITIAL (current_function_decl)); /* Assume all locals are unused. */ FOR_EACH_REFERENCED_VAR (t, rvi) var_ann (t)->used = false; *************** remove_unused_locals (void) *** 498,504 **** *cell = TREE_CHAIN (*cell); continue; } - cell = &TREE_CHAIN (*cell); } --- 608,613 ---- *************** remove_unused_locals (void) *** 516,521 **** --- 625,631 ---- && !ann->symbol_mem_tag && !TREE_ADDRESSABLE (t)) remove_referenced_var (t); + remove_unused_scope_block_p (DECL_INITIAL (current_function_decl)); } Index: tree-inline.c =================================================================== *** tree-inline.c (revision 124614) --- tree-inline.c (working copy) *************** expand_call_inline (basic_block bb, tree *** 2498,2507 **** actual inline expansion of the body, and a label for the return statements within the function to jump to. The type of the statement expression is the return type of the function call. */ ! id->block = make_node (BLOCK); ! BLOCK_ABSTRACT_ORIGIN (id->block) = fn; ! BLOCK_SOURCE_LOCATION (id->block) = input_location; ! add_lexical_block (TREE_BLOCK (stmt), id->block); /* Local declarations will be replaced by their equivalents in this map. */ --- 2498,2513 ---- actual inline expansion of the body, and a label for the return statements within the function to jump to. The type of the statement expression is the return type of the function call. */ ! if (debug_info_level == DINFO_LEVEL_NORMAL ! || debug_info_level == DINFO_LEVEL_VERBOSE) ! { ! id->block = make_node (BLOCK); ! BLOCK_ABSTRACT_ORIGIN (id->block) = fn; ! BLOCK_SOURCE_LOCATION (id->block) = input_location; ! add_lexical_block (TREE_BLOCK (stmt), id->block); ! } ! else ! id->block = DECL_INITIAL (current_function_decl); /* Local declarations will be replaced by their equivalents in this map. */ Index: tree-cfg.c =================================================================== *** tree-cfg.c (revision 124614) --- tree-cfg.c (working copy) *************** move_sese_region_to_fn (struct function *** 4925,4930 **** --- 4925,4948 ---- return bb; } + /* Dump scope blocks. */ + + static void + dump_scope_block (FILE *file, int indent, tree scope, int flags) + { + tree var, t; + + fprintf (file, "\n%*sScope block #%i\n",indent, "" , BLOCK_NUMBER (scope)); + for (var = BLOCK_VARS (scope); var; var = TREE_CHAIN (var)) + { + fprintf (file, "%*s",indent, ""); + print_generic_decl (file, var, flags); + fprintf (file, "\n"); + } + for (t = BLOCK_SUBBLOCKS (scope); t ; t = BLOCK_CHAIN (t)) + dump_scope_block (file, indent + 2, t, flags); + } + /* Dump FUNCTION_DECL FN to file FILE using FLAGS (see TDF_* in tree.h) */ *************** dump_function_to_file (tree fn, FILE *fi *** 4980,4985 **** --- 4998,5005 ---- any_var = true; } + + dump_scope_block (file, 0, DECL_INITIAL (fn), flags); } if (cfun && cfun->decl == fn && cfun->cfg && basic_block_info)