Both jump threading and loop induction variable optimizations were dropping useful debug information, and it took improvements in both for debug info about relevant variables in the enclosed testcase to survive all the way to the end.
The first problem was that jump threading could bypass blocks containing debug stmts, losing the required bindings. The solution was to propagate bypassed debug insns into the target of the bypass. Even if the new confluence ends up introducing new PHI nodes, the propagated debug binds will resolve to them, just as intended. This is implemented in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch The second part of the problem was that, when performing loop ivopts, we'd often drop PHI nodes and other SSA names for unused ivs. Although we had code to propagate SSA DEFs into debug uses upon their removal, this couldn't take care of PHI nodes (no debug phis or conditional debug binds), and it was precisely at the removal of a PHI node that we dropped debug info for the loop in the provided testcase. Once Jakub figured out how to compute an unused iv out of available ivs (thanks!), it was easy to introduce debug temps with the expressions to compute them, so that debug binds would remain correct as long as the unused iv can still be computed out of the others. (If it can't, we'll still try propagation, but may end up losing at the PHI nodes). I had thought that replacing only the PHI nodes would suffice, but it turned out that replacing all unused iv defs with their equivalent used-IV expressions got us better coverage, so this is what the 5th patch below does: vta-ivopts-replace-dropped-phis-pr54693.patch Regression testing revealed -fcompare-debug regressions exposed by these patches. x86-specific code introduces pre-reload scheduling dependencies between calls and likely-spilled parameter registers, but it does so in a way that's AFAICT buggy, and fragile to the presence of debug insns at the top of a block: we won't introduce a dependency for the first insn of the block, even if we'd rather have such a dependency. This fails to achieve the intended effect, and it also causes codegen differences when the first insn in the block happens to be a debug insn, for then we will add the intended dependency. The first patch below, vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug insns, so as to remove the difference, and moves the end of the backward scan to the insn before the first actual insn, so that we don't refrain from considering it for dependencies. This in turn required an additional test to make sure we wouldn't go past the nondebug head if first_arg happened to be the head. The introduction of debug insns at new spots also exposed a bug in loop unrolling: we'd unroll a loop a different number of times depending on whether or not its latch is an empty block. The propagation or introduction of debug insns in previously-empty latch blocks caused loops to be unrolled a different number of times depending on the presence of the debug insns, which triggered -fcompare-debug errors. The fix was to count only nondebug insns towards the decision on whether the latch block was empty. This is implemented in the second patch below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch Guality testsuite regressions given the patches above revealed that the fast DCE global dead debug substitution introduced for PR54551 was not correct: it was possible for us to visit, for the last time, a block with a REG used in debug stmts after its death before we visited the block with the debug use. As a result, we'd fail to emit a debug temp at the not-revisited block, and the debug temp bindings introduced at other blocks might be insufficient to get a value to the debug use point, or even get an incorrect value there. I've fixed this problem by using the DU chains at the time we realize a dead debug use uses a value from a different block, adding at that moment debug temps at all defs of the REG and replacing all debug uses with the debug temp, and arranging that we don't do that again for the same REG. This ensures that, regardless of the order in which we visit blocks, we'll get correct debug temp bindings. This fix is implemented in the 3rd patch below: vta-dce-globalize-debug-review-pr54551-pr54693.patch While looking into some crashes due to a bug in an earlier version of the patch described above, I suspected the problem had to do with our DF rescanning newly-emitted debug temps right away. I know SSA updating messes with SSA def/use chains, and I suspected DF rescanning might invalidate def/use chains as well. It turned out that the problem was unrelated, but I kind of liked moving the initialization of the to_rescan bitmap out of the loop over uses, and deferring the rescanning of the new debug temp with it. However, since that's not a required part of the proposed fixes, I split it out into a separate patch, the 6th and last below: vta-valtrack-defer-debug-temp-rescan-pr54693.patch The patches were regstrapped together, on i686- and x86_64-linux-gnu, and the only regression was guality/pr43051 on i686: a debug temp would now be reset by the scheduler as it moved a def of a REG before a debug temp that used the old value of the REG. I suppose we could improve sched so as to try and emit a debug temp before the moved-ahead insn, and then replace the REG with the debug temp in the debug use, instead of resetting it, but since this is not exactly trivial, it's not clear how much benefit it would bring and at what cost, and this patchset had already been cooking for a while, I figured I'd stop at this point and post the patchset with this caveat. Ok to install?
Stabilize i386 sched calls and args WRT debug insns From: Alexandre Oliva <lxol...@fsfla.org> for gcc/ChangeLog PR debug/54693 * config/i386/i386.c (add_parameter_dependencies): Stop backward scan at the insn before the incoming head. (ix86_dependencies_evaluation_hook): Skip debug insns. Stop if first_arg is head. --- gcc/config/i386/i386.c | 6 ++++++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c98c6b7..ce728b7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24459,6 +24459,8 @@ add_parameter_dependencies (rtx call, rtx head) rtx first_arg = NULL; bool is_spilled = false; + head = PREV_INSN (head); + /* Find nearest to call argument passing instruction. */ while (true) { @@ -24556,6 +24558,8 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail) rtx first_arg = NULL; if (reload_completed) return; + while (head != tail && DEBUG_INSN_P (head)) + head = NEXT_INSN (head); for (insn = tail; insn != head; insn = PREV_INSN (insn)) if (INSN_P (insn) && CALL_P (insn)) { @@ -24576,6 +24580,8 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail) add_dependee_for_func_arg (first_arg, BLOCK_FOR_INSN (dee)); } insn = first_arg; + if (insn == head) + break; } } else if (first_arg)
Stabilize loop-unroll empty latch check WRT debug insns From: Alexandre Oliva <lxol...@fsfla.org> for gcc/ChangeLog PR debug/54693 * loop-unroll.c (loop_exit_at_end_p): Skip debug insns. --- gcc/loop-unroll.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c index 92e3c1a..a539b42 100644 --- a/gcc/loop-unroll.c +++ b/gcc/loop-unroll.c @@ -215,7 +215,7 @@ loop_exit_at_end_p (struct loop *loop) /* Check that the latch is empty. */ FOR_BB_INSNS (loop->latch, insn) { - if (INSN_P (insn)) + if (NONDEBUG_INSN_P (insn)) return false; }
Promote dead debug uses with immediate global replacement From: Alexandre Oliva <lxol...@fsfla.org> for gcc/ChangeLog PR debug/54551 PR debug/54693 * valtrack.c (dead_debug_global_find): Accept NULL dtemp. (dead_debug_global_insert): Return new entry. (dead_debug_global_replace_temp): Return early if REG is no longer in place, or if dtemp was already substituted. (dead_debug_promote_uses): Insert for all defs and replace all debug uses at once. (dead_debug_local_finish): Release used after promotion. (dead_debug_insert_temp): Stop if dtemp is NULL. --- gcc/valtrack.c | 68 +++++++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 52 insertions(+), 16 deletions(-) diff --git a/gcc/valtrack.c b/gcc/valtrack.c index 52f5ed6..f6c0db4 100644 --- a/gcc/valtrack.c +++ b/gcc/valtrack.c @@ -225,14 +225,13 @@ dead_debug_global_find (struct dead_debug_global *global, rtx reg) dead_debug_global_entry *entry = global->htab.find (&temp_entry); gcc_checking_assert (entry && entry->reg == temp_entry.reg); - gcc_checking_assert (entry->dtemp); return entry; } /* Insert an entry mapping REG to DTEMP in GLOBAL->htab. */ -static void +static dead_debug_global_entry * dead_debug_global_insert (struct dead_debug_global *global, rtx reg, rtx dtemp) { dead_debug_global_entry temp_entry; @@ -246,6 +245,7 @@ dead_debug_global_insert (struct dead_debug_global *global, rtx reg, rtx dtemp) gcc_checking_assert (!*slot); *slot = XNEW (dead_debug_global_entry); **slot = temp_entry; + return *slot; } /* If UREGNO, referenced by USE, is a pseudo marked as used in GLOBAL, @@ -263,16 +263,19 @@ dead_debug_global_replace_temp (struct dead_debug_global *global, { if (!global || uregno < FIRST_PSEUDO_REGISTER || !global->used + || !REG_P (*DF_REF_REAL_LOC (use)) + || REGNO (*DF_REF_REAL_LOC (use)) != uregno || !bitmap_bit_p (global->used, uregno)) return false; - gcc_checking_assert (REGNO (*DF_REF_REAL_LOC (use)) == uregno); - dead_debug_global_entry *entry = dead_debug_global_find (global, *DF_REF_REAL_LOC (use)); gcc_checking_assert (GET_CODE (entry->reg) == REG && REGNO (entry->reg) == uregno); + if (!entry->dtemp) + return true; + *DF_REF_REAL_LOC (use) = entry->dtemp; if (!pto_rescan) df_insn_rescan (DF_REF_INSN (use)); @@ -364,6 +367,8 @@ dead_debug_promote_uses (struct dead_debug_local *debug) head; head = *headp) { rtx reg = *DF_REF_REAL_LOC (head->use); + df_ref ref; + dead_debug_global_entry *entry; if (GET_CODE (reg) != REG || REGNO (reg) < FIRST_PSEUDO_REGISTER) @@ -376,17 +381,46 @@ dead_debug_promote_uses (struct dead_debug_local *debug) debug->global->used = BITMAP_ALLOC (NULL); if (bitmap_set_bit (debug->global->used, REGNO (reg))) - dead_debug_global_insert (debug->global, reg, - make_debug_expr_from_rtl (reg)); + entry = dead_debug_global_insert (debug->global, reg, + make_debug_expr_from_rtl (reg)); - if (!dead_debug_global_replace_temp (debug->global, head->use, - REGNO (reg), &debug->to_rescan)) - { - headp = &head->next; - continue; - } - + gcc_checking_assert (entry->dtemp); + + /* Tentatively remove the USE from the list. */ *headp = head->next; + + if (!debug->to_rescan) + debug->to_rescan = BITMAP_ALLOC (NULL); + + for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; + ref = DF_REF_NEXT_REG (ref)) + if (DEBUG_INSN_P (DF_REF_INSN (ref))) + { + if (!dead_debug_global_replace_temp (debug->global, ref, + REGNO (reg), + &debug->to_rescan)) + { + rtx insn = DF_REF_INSN (ref); + INSN_VAR_LOCATION_LOC (insn) = gen_rtx_UNKNOWN_VAR_LOC (); + bitmap_set_bit (debug->to_rescan, INSN_UID (insn)); + } + } + + for (ref = DF_REG_DEF_CHAIN (REGNO (reg)); ref; + ref = DF_REF_NEXT_REG (ref)) + if (!dead_debug_insert_temp (debug, REGNO (reg), DF_REF_INSN (ref), + DEBUG_TEMP_BEFORE_WITH_VALUE)) + { + rtx bind; + bind = gen_rtx_VAR_LOCATION (GET_MODE (reg), + DEBUG_EXPR_TREE_DECL (entry->dtemp), + gen_rtx_UNKNOWN_VAR_LOC (), + VAR_INIT_STATUS_INITIALIZED); + rtx insn = emit_debug_insn_before (bind, DF_REF_INSN (ref)); + bitmap_set_bit (debug->to_rescan, INSN_UID (insn)); + } + + entry->dtemp = NULL; XDELETE (head); } } @@ -398,12 +432,12 @@ dead_debug_promote_uses (struct dead_debug_local *debug) void dead_debug_local_finish (struct dead_debug_local *debug, bitmap used) { - if (debug->used != used) - BITMAP_FREE (debug->used); - if (debug->global) dead_debug_promote_uses (debug); + if (debug->used != used) + BITMAP_FREE (debug->used); + dead_debug_reset_uses (debug, debug->head); if (debug->to_rescan) @@ -535,6 +569,8 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno, = dead_debug_global_find (debug->global, reg); gcc_checking_assert (entry->reg == reg); dval = entry->dtemp; + if (!dval) + return 0; } gcc_checking_assert (uses || global);
Propagate debug stmts for jump threading From: Alexandre Oliva <lxol...@fsfla.org> for gcc/ChangeLog PR debug/54693 * tree-ssa-threadedge.c (thread_around_empty_block): Copy debug temps from predecessor before threading. for gcc/testsuite/ChangeLog PR debug/54693 * gcc.dg/guality/pr54693.c: New. --- gcc/testsuite/gcc.dg/guality/pr54693.c | 30 ++++++++++++++++++++++++++++++ gcc/tree-ssa-threadedge.c | 18 ++++++++++++++++++ 2 files changed, 48 insertions(+), 0 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/guality/pr54693.c diff --git a/gcc/testsuite/gcc.dg/guality/pr54693.c b/gcc/testsuite/gcc.dg/guality/pr54693.c new file mode 100644 index 0000000..4551a78 --- /dev/null +++ b/gcc/testsuite/gcc.dg/guality/pr54693.c @@ -0,0 +1,30 @@ +/* PR debug/54693 */ +/* { dg-do run } */ +/* { dg-options "-g" } */ + +#include <stdlib.h> + +__attribute__((noinline, noclone)) void +foo (char *str, char c) +{ + asm volatile ("" : : "r" (str), "r" (c) : "memory"); + *str = c; +} + +int +main () +{ + int i; + char c; + char arr[11]; + + for (i = 0; i < 10; i++) + { + c = 0x30 + i; + foo (&arr[i], c); /* { dg-final { gdb-test 24 "i" "c - 48" } } */ + } + + __builtin_printf ("arr = %s\n", arr); + return 0; +} + diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 6249e2f..531d930 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -637,6 +637,24 @@ thread_around_empty_block (edge taken_edge, if (!single_pred_p (bb)) return NULL; + /* Before threading, copy DEBUG stmts from the predecessor, so that + we don't lose the bindings as we redirect the edges. */ + if (MAY_HAVE_DEBUG_STMTS) + { + gsi = gsi_after_labels (bb); + for (gimple_stmt_iterator si = gsi_last_bb (taken_edge->src); + !gsi_end_p (si); gsi_prev (&si)) + { + stmt = gsi_stmt (si); + if (!is_gimple_debug (stmt)) + continue; + + stmt = gimple_copy (stmt); + /* ??? Should we drop the location of the copy? */ + gsi_insert_before (&gsi, stmt, GSI_NEW_STMT); + } + } + /* This block must have more than one successor. */ if (single_succ_p (bb)) return NULL;
Replace PHI nodes of dropped ivs with debug temps From: Alexandre Oliva <lxol...@fsfla.org>, Jakub Jelinek <ja...@redhat.com> for gcc/ChangeLog PR debug/54693 * tree-ssa-loop-ivopts.c (remove_unused_ivs): Emit debug temps for dropped IV sets. --- gcc/tree-ssa-loop-ivopts.c | 102 ++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 101 insertions(+), 1 deletions(-) diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 74097f8..55f8a74 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -6422,7 +6422,107 @@ remove_unused_ivs (struct ivopts_data *data) && !info->inv_id && !info->iv->have_use_for && !info->preserve_biv) - bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name)); + { + bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name)); + + tree def = info->iv->ssa_name; + + if (MAY_HAVE_DEBUG_STMTS && SSA_NAME_DEF_STMT (def)) + { + imm_use_iterator imm_iter; + use_operand_p use_p; + gimple stmt; + int count = 0; + + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) + { + if (!gimple_debug_bind_p (stmt)) + continue; + + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + count++; + + if (count > 1) + BREAK_FROM_IMM_USE_STMT (imm_iter); + } + + if (!count) + continue; + + struct iv_use dummy_use; + struct iv_cand *best_cand = NULL, *cand; + unsigned i, best_pref = 0, cand_pref; + + memset (&dummy_use, 0, sizeof (dummy_use)); + dummy_use.iv = info->iv; + for (i = 0; i < n_iv_uses (data) && i < 64; i++) + { + cand = iv_use (data, i)->selected; + if (cand == best_cand) + continue; + cand_pref = operand_equal_p (cand->iv->step, + info->iv->step, 0) + ? 4 : 0; + cand_pref + += TYPE_MODE (TREE_TYPE (cand->iv->base)) + == TYPE_MODE (TREE_TYPE (info->iv->base)) + ? 2 : 0; + cand_pref + += TREE_CODE (cand->iv->base) == INTEGER_CST + ? 1 : 0; + if (best_cand == NULL || best_pref < cand_pref) + { + best_cand = cand; + best_pref = cand_pref; + } + } + + if (!best_cand) + continue; + + tree comp = get_computation_at (data->current_loop, + &dummy_use, best_cand, + SSA_NAME_DEF_STMT (def)); + if (!comp) + continue; + + if (count > 1) + { + tree vexpr = make_node (DEBUG_EXPR_DECL); + DECL_ARTIFICIAL (vexpr) = 1; + TREE_TYPE (vexpr) = TREE_TYPE (comp); + if (SSA_NAME_VAR (def)) + DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def)); + else + DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr)); + gimple def_temp = gimple_build_debug_bind (vexpr, comp, NULL); + gimple_stmt_iterator gsi; + + if (gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI) + gsi = gsi_after_labels (gimple_bb + (SSA_NAME_DEF_STMT (def))); + else + gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (def)); + + gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT); + comp = vexpr; + } + + FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def) + { + if (!gimple_debug_bind_p (stmt)) + continue; + + FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) + SET_USE (use_p, comp); + + if (!comp) + BREAK_FROM_IMM_USE_STMT (imm_iter); + + update_stmt (stmt); + } + } + } } release_defs_bitset (toremove);
Defer rescan of debug insns emitted for debug temps for dead debug uses From: Alexandre Oliva <lxol...@fsfla.org> for gcc/ChangeLog PR debug/54693 * gcc/valtrack.c (dead_debug_insert_temp): Defer rescan of newly-emitted debug insn. --- gcc/valtrack.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/valtrack.c b/gcc/valtrack.c index f6c0db4..8cc3269 100644 --- a/gcc/valtrack.c +++ b/gcc/valtrack.c @@ -688,7 +688,9 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno, bind = emit_debug_insn_after (bind, insn); else bind = emit_debug_insn_before (bind, insn); - df_insn_rescan (bind); + if (debug->to_rescan == NULL) + debug->to_rescan = BITMAP_ALLOC (NULL); + bitmap_set_bit (debug->to_rescan, INSN_UID (bind)); /* Adjust all uses. */ while ((cur = uses)) @@ -699,8 +701,6 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno, *DF_REF_REAL_LOC (cur->use) = gen_lowpart_SUBREG (GET_MODE (*DF_REF_REAL_LOC (cur->use)), dval); /* ??? Should we simplify subreg of subreg? */ - if (debug->to_rescan == NULL) - debug->to_rescan = BITMAP_ALLOC (NULL); bitmap_set_bit (debug->to_rescan, INSN_UID (DF_REF_INSN (cur->use))); uses = cur->next; XDELETE (cur);
-- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer