Both jump threading and loop induction variable optimizations were
dropping useful debug information, and it took improvements in both for
debug info about relevant variables in the enclosed testcase to survive
all the way to the end.

The first problem was that jump threading could bypass blocks containing
debug stmts, losing the required bindings.  The solution was to
propagate bypassed debug insns into the target of the bypass.  Even if
the new confluence ends up introducing new PHI nodes, the propagated
debug binds will resolve to them, just as intended.  This is implemented
in the 4th patch below: vta-jump-thread-prop-debug-pr54693.patch

The second part of the problem was that, when performing loop ivopts,
we'd often drop PHI nodes and other SSA names for unused ivs.  Although
we had code to propagate SSA DEFs into debug uses upon their removal,
this couldn't take care of PHI nodes (no debug phis or conditional debug
binds), and it was precisely at the removal of a PHI node that we
dropped debug info for the loop in the provided testcase.  Once Jakub
figured out how to compute an unused iv out of available ivs (thanks!),
it was easy to introduce debug temps with the expressions to compute
them, so that debug binds would remain correct as long as the unused iv
can still be computed out of the others.  (If it can't, we'll still try
propagation, but may end up losing at the PHI nodes).  I had thought
that replacing only the PHI nodes would suffice, but it turned out that
replacing all unused iv defs with their equivalent used-IV expressions
got us better coverage, so this is what the 5th patch below does:
vta-ivopts-replace-dropped-phis-pr54693.patch

Regression testing revealed -fcompare-debug regressions exposed by these
patches.

x86-specific code introduces pre-reload scheduling dependencies between
calls and likely-spilled parameter registers, but it does so in a way
that's AFAICT buggy, and fragile to the presence of debug insns at the
top of a block: we won't introduce a dependency for the first insn of
the block, even if we'd rather have such a dependency.  This fails to
achieve the intended effect, and it also causes codegen differences when
the first insn in the block happens to be a debug insn, for then we will
add the intended dependency.  The first patch below,
vta-stabilize-i386-call-args-sched-pr54693.patch, skips leading debug
insns, so as to remove the difference, and moves the end of the backward
scan to the insn before the first actual insn, so that we don't refrain
from considering it for dependencies.  This in turn required an
additional test to make sure we wouldn't go past the nondebug head if
first_arg happened to be the head.

The introduction of debug insns at new spots also exposed a bug in loop
unrolling: we'd unroll a loop a different number of times depending on
whether or not its latch is an empty block.  The propagation or
introduction of debug insns in previously-empty latch blocks caused
loops to be unrolled a different number of times depending on the
presence of the debug insns, which triggered -fcompare-debug errors.
The fix was to count only nondebug insns towards the decision on whether
the latch block was empty.  This is implemented in the second patch
below: vta-stabilize-loop-unroll-empty-latch-check-pr54693.patch

Guality testsuite regressions given the patches above revealed that the
fast DCE global dead debug substitution introduced for PR54551 was not
correct: it was possible for us to visit, for the last time, a block
with a REG used in debug stmts after its death before we visited the
block with the debug use.  As a result, we'd fail to emit a debug temp
at the not-revisited block, and the debug temp bindings introduced at
other blocks might be insufficient to get a value to the debug use
point, or even get an incorrect value there.  I've fixed this problem by
using the DU chains at the time we realize a dead debug use uses a value
from a different block, adding at that moment debug temps at all defs of
the REG and replacing all debug uses with the debug temp, and arranging
that we don't do that again for the same REG.  This ensures that,
regardless of the order in which we visit blocks, we'll get correct
debug temp bindings.  This fix is implemented in the 3rd patch below:
vta-dce-globalize-debug-review-pr54551-pr54693.patch

While looking into some crashes due to a bug in an earlier version of
the patch described above, I suspected the problem had to do with our DF
rescanning newly-emitted debug temps right away.  I know SSA updating
messes with SSA def/use chains, and I suspected DF rescanning might
invalidate def/use chains as well.  It turned out that the problem was
unrelated, but I kind of liked moving the initialization of the
to_rescan bitmap out of the loop over uses, and deferring the rescanning
of the new debug temp with it.  However, since that's not a required
part of the proposed fixes, I split it out into a separate patch, the
6th and last below: vta-valtrack-defer-debug-temp-rescan-pr54693.patch

The patches were regstrapped together, on i686- and x86_64-linux-gnu,
and the only regression was guality/pr43051 on i686: a debug temp would
now be reset by the scheduler as it moved a def of a REG before a debug
temp that used the old value of the REG.  I suppose we could improve
sched so as to try and emit a debug temp before the moved-ahead insn,
and then replace the REG with the debug temp in the debug use, instead
of resetting it, but since this is not exactly trivial, it's not clear
how much benefit it would bring and at what cost, and this patchset had
already been cooking for a while, I figured I'd stop at this point and
post the patchset with this caveat.

Ok to install?


Stabilize i386 sched calls and args WRT debug insns

From: Alexandre Oliva <lxol...@fsfla.org>

for  gcc/ChangeLog

	PR debug/54693
	* config/i386/i386.c (add_parameter_dependencies): Stop
	backward scan at the insn before the incoming head.
	(ix86_dependencies_evaluation_hook): Skip debug insns.  Stop
	if first_arg is head.
---

 gcc/config/i386/i386.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c98c6b7..ce728b7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -24459,6 +24459,8 @@ add_parameter_dependencies (rtx call, rtx head)
   rtx first_arg = NULL;
   bool is_spilled = false;
 
+  head = PREV_INSN (head);
+
   /* Find nearest to call argument passing instruction.  */
   while (true)
     {
@@ -24556,6 +24558,8 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail)
   rtx first_arg = NULL;
   if (reload_completed)
     return;
+  while (head != tail && DEBUG_INSN_P (head))
+    head = NEXT_INSN (head);
   for (insn = tail; insn != head; insn = PREV_INSN (insn))
     if (INSN_P (insn) && CALL_P (insn))
       {
@@ -24576,6 +24580,8 @@ ix86_dependencies_evaluation_hook (rtx head, rtx tail)
 		  add_dependee_for_func_arg (first_arg, BLOCK_FOR_INSN (dee));
 	      }
 	    insn = first_arg;
+	    if (insn == head)
+	      break;
 	  }
       }
     else if (first_arg)
Stabilize loop-unroll empty latch check WRT debug insns

From: Alexandre Oliva <lxol...@fsfla.org>

for  gcc/ChangeLog

	PR debug/54693
	* loop-unroll.c (loop_exit_at_end_p): Skip debug insns.
---

 gcc/loop-unroll.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 92e3c1a..a539b42 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -215,7 +215,7 @@ loop_exit_at_end_p (struct loop *loop)
   /* Check that the latch is empty.  */
   FOR_BB_INSNS (loop->latch, insn)
     {
-      if (INSN_P (insn))
+      if (NONDEBUG_INSN_P (insn))
 	return false;
     }
 
Promote dead debug uses with immediate global replacement

From: Alexandre Oliva <lxol...@fsfla.org>

for  gcc/ChangeLog

	PR debug/54551
	PR debug/54693
	* valtrack.c (dead_debug_global_find): Accept NULL dtemp.
	(dead_debug_global_insert): Return new entry.
	(dead_debug_global_replace_temp): Return early if REG is no
	longer in place, or if dtemp was already substituted.
	(dead_debug_promote_uses): Insert for all defs and replace all
	debug uses at once.
	(dead_debug_local_finish): Release used after promotion.
	(dead_debug_insert_temp): Stop if dtemp is NULL.
---

 gcc/valtrack.c |   68 +++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 52 insertions(+), 16 deletions(-)


diff --git a/gcc/valtrack.c b/gcc/valtrack.c
index 52f5ed6..f6c0db4 100644
--- a/gcc/valtrack.c
+++ b/gcc/valtrack.c
@@ -225,14 +225,13 @@ dead_debug_global_find (struct dead_debug_global *global, rtx reg)
 
   dead_debug_global_entry *entry = global->htab.find (&temp_entry);
   gcc_checking_assert (entry && entry->reg == temp_entry.reg);
-  gcc_checking_assert (entry->dtemp);
 
   return entry;
 }
 
 /* Insert an entry mapping REG to DTEMP in GLOBAL->htab.  */
 
-static void
+static dead_debug_global_entry *
 dead_debug_global_insert (struct dead_debug_global *global, rtx reg, rtx dtemp)
 {
   dead_debug_global_entry temp_entry;
@@ -246,6 +245,7 @@ dead_debug_global_insert (struct dead_debug_global *global, rtx reg, rtx dtemp)
   gcc_checking_assert (!*slot);
   *slot = XNEW (dead_debug_global_entry);
   **slot = temp_entry;
+  return *slot;
 }
 
 /* If UREGNO, referenced by USE, is a pseudo marked as used in GLOBAL,
@@ -263,16 +263,19 @@ dead_debug_global_replace_temp (struct dead_debug_global *global,
 {
   if (!global || uregno < FIRST_PSEUDO_REGISTER
       || !global->used
+      || !REG_P (*DF_REF_REAL_LOC (use))
+      || REGNO (*DF_REF_REAL_LOC (use)) != uregno
       || !bitmap_bit_p (global->used, uregno))
     return false;
 
-  gcc_checking_assert (REGNO (*DF_REF_REAL_LOC (use)) == uregno);
-
   dead_debug_global_entry *entry
     = dead_debug_global_find (global, *DF_REF_REAL_LOC (use));
   gcc_checking_assert (GET_CODE (entry->reg) == REG
 		       && REGNO (entry->reg) == uregno);
 
+  if (!entry->dtemp)
+    return true;
+
   *DF_REF_REAL_LOC (use) = entry->dtemp;
   if (!pto_rescan)
     df_insn_rescan (DF_REF_INSN (use));
@@ -364,6 +367,8 @@ dead_debug_promote_uses (struct dead_debug_local *debug)
        head; head = *headp)
     {
       rtx reg = *DF_REF_REAL_LOC (head->use);
+      df_ref ref;
+      dead_debug_global_entry *entry;
 
       if (GET_CODE (reg) != REG
 	  || REGNO (reg) < FIRST_PSEUDO_REGISTER)
@@ -376,17 +381,46 @@ dead_debug_promote_uses (struct dead_debug_local *debug)
 	debug->global->used = BITMAP_ALLOC (NULL);
 
       if (bitmap_set_bit (debug->global->used, REGNO (reg)))
-	dead_debug_global_insert (debug->global, reg,
-				  make_debug_expr_from_rtl (reg));
+	entry = dead_debug_global_insert (debug->global, reg,
+					  make_debug_expr_from_rtl (reg));
 
-      if (!dead_debug_global_replace_temp (debug->global, head->use,
-					   REGNO (reg), &debug->to_rescan))
-	{
-	  headp = &head->next;
-	  continue;
-	}
-      
+      gcc_checking_assert (entry->dtemp);
+
+      /* Tentatively remove the USE from the list.  */
       *headp = head->next;
+
+      if (!debug->to_rescan)
+	debug->to_rescan = BITMAP_ALLOC (NULL);
+
+      for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref;
+	   ref = DF_REF_NEXT_REG (ref))
+	if (DEBUG_INSN_P (DF_REF_INSN (ref)))
+	  {
+	    if (!dead_debug_global_replace_temp (debug->global, ref,
+						 REGNO (reg),
+						 &debug->to_rescan))
+	      {
+		rtx insn = DF_REF_INSN (ref);
+		INSN_VAR_LOCATION_LOC (insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+		bitmap_set_bit (debug->to_rescan, INSN_UID (insn));
+	      }
+	  }
+
+      for (ref = DF_REG_DEF_CHAIN (REGNO (reg)); ref;
+	   ref = DF_REF_NEXT_REG (ref))
+	if (!dead_debug_insert_temp (debug, REGNO (reg), DF_REF_INSN (ref),
+				     DEBUG_TEMP_BEFORE_WITH_VALUE))
+	  {
+	    rtx bind;
+	    bind = gen_rtx_VAR_LOCATION (GET_MODE (reg),
+					 DEBUG_EXPR_TREE_DECL (entry->dtemp),
+					 gen_rtx_UNKNOWN_VAR_LOC (),
+					 VAR_INIT_STATUS_INITIALIZED);
+	    rtx insn = emit_debug_insn_before (bind, DF_REF_INSN (ref));
+	    bitmap_set_bit (debug->to_rescan, INSN_UID (insn));
+	  }
+
+      entry->dtemp = NULL;
       XDELETE (head);
     }
 }
@@ -398,12 +432,12 @@ dead_debug_promote_uses (struct dead_debug_local *debug)
 void
 dead_debug_local_finish (struct dead_debug_local *debug, bitmap used)
 {
-  if (debug->used != used)
-    BITMAP_FREE (debug->used);
-
   if (debug->global)
     dead_debug_promote_uses (debug);
 
+  if (debug->used != used)
+    BITMAP_FREE (debug->used);
+
   dead_debug_reset_uses (debug, debug->head);
 
   if (debug->to_rescan)
@@ -535,6 +569,8 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno,
 	= dead_debug_global_find (debug->global, reg);
       gcc_checking_assert (entry->reg == reg);
       dval = entry->dtemp;
+      if (!dval)
+	return 0;
     }
 
   gcc_checking_assert (uses || global);
Propagate debug stmts for jump threading

From: Alexandre Oliva <lxol...@fsfla.org>

for  gcc/ChangeLog

	PR debug/54693
	* tree-ssa-threadedge.c (thread_around_empty_block): Copy
	debug temps from predecessor before threading.

for  gcc/testsuite/ChangeLog

	PR debug/54693
	* gcc.dg/guality/pr54693.c: New.
---

 gcc/testsuite/gcc.dg/guality/pr54693.c |   30 ++++++++++++++++++++++++++++++
 gcc/tree-ssa-threadedge.c              |   18 ++++++++++++++++++
 2 files changed, 48 insertions(+), 0 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/guality/pr54693.c


diff --git a/gcc/testsuite/gcc.dg/guality/pr54693.c b/gcc/testsuite/gcc.dg/guality/pr54693.c
new file mode 100644
index 0000000..4551a78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/guality/pr54693.c
@@ -0,0 +1,30 @@
+/* PR debug/54693 */
+/* { dg-do run } */
+/* { dg-options "-g" } */
+
+#include <stdlib.h>
+
+__attribute__((noinline, noclone)) void
+foo (char *str, char c)
+{
+  asm volatile ("" : : "r" (str), "r" (c) : "memory");
+  *str = c;
+}
+
+int
+main ()
+{
+  int i;
+  char c;
+  char arr[11];
+
+  for (i = 0; i < 10; i++)
+    {
+      c = 0x30 + i;
+      foo (&arr[i], c); /* { dg-final { gdb-test 24 "i" "c - 48" } } */
+    }
+
+  __builtin_printf ("arr = %s\n", arr);
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 6249e2f..531d930 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -637,6 +637,24 @@ thread_around_empty_block (edge taken_edge,
   if (!single_pred_p (bb))
     return NULL;
 
+  /* Before threading, copy DEBUG stmts from the predecessor, so that
+     we don't lose the bindings as we redirect the edges.  */
+  if (MAY_HAVE_DEBUG_STMTS)
+    {
+      gsi = gsi_after_labels (bb);
+      for (gimple_stmt_iterator si = gsi_last_bb (taken_edge->src);
+	   !gsi_end_p (si); gsi_prev (&si))
+	{
+	  stmt = gsi_stmt (si);
+	  if (!is_gimple_debug (stmt))
+	    continue;
+
+	  stmt = gimple_copy (stmt);
+	  /* ??? Should we drop the location of the copy?  */
+	  gsi_insert_before (&gsi, stmt, GSI_NEW_STMT);
+	}
+    }
+
   /* This block must have more than one successor.  */
   if (single_succ_p (bb))
     return NULL;
Replace PHI nodes of dropped ivs with debug temps

From: Alexandre Oliva <lxol...@fsfla.org>, Jakub Jelinek <ja...@redhat.com>

for  gcc/ChangeLog

	PR debug/54693
	* tree-ssa-loop-ivopts.c (remove_unused_ivs): Emit debug temps
	for dropped IV sets.
---

 gcc/tree-ssa-loop-ivopts.c |  102 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 101 insertions(+), 1 deletions(-)


diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 74097f8..55f8a74 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -6422,7 +6422,107 @@ remove_unused_ivs (struct ivopts_data *data)
 	  && !info->inv_id
 	  && !info->iv->have_use_for
 	  && !info->preserve_biv)
-	bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name));
+	{
+	  bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name));
+	  
+	  tree def = info->iv->ssa_name;
+
+	  if (MAY_HAVE_DEBUG_STMTS && SSA_NAME_DEF_STMT (def))
+	    {
+	      imm_use_iterator imm_iter;
+	      use_operand_p use_p;
+	      gimple stmt;
+	      int count = 0;
+
+	      FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
+		{
+		  if (!gimple_debug_bind_p (stmt))
+		    continue;
+
+		  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+		    count++;
+
+		  if (count > 1)
+		    BREAK_FROM_IMM_USE_STMT (imm_iter);
+		}
+
+	      if (!count)
+		continue;
+
+	      struct iv_use dummy_use;
+	      struct iv_cand *best_cand = NULL, *cand;
+	      unsigned i, best_pref = 0, cand_pref;
+
+	      memset (&dummy_use, 0, sizeof (dummy_use));
+	      dummy_use.iv = info->iv;
+	      for (i = 0; i < n_iv_uses (data) && i < 64; i++)
+		{
+		  cand = iv_use (data, i)->selected;
+		  if (cand == best_cand)
+		    continue;
+		  cand_pref = operand_equal_p (cand->iv->step,
+					       info->iv->step, 0)
+		    ? 4 : 0;
+		  cand_pref
+		    += TYPE_MODE (TREE_TYPE (cand->iv->base))
+		    == TYPE_MODE (TREE_TYPE (info->iv->base))
+		    ? 2 : 0;
+		  cand_pref
+		    += TREE_CODE (cand->iv->base) == INTEGER_CST
+		    ? 1 : 0;
+		  if (best_cand == NULL || best_pref < cand_pref)
+		    {
+		      best_cand = cand;
+		      best_pref = cand_pref;
+		    }
+		}
+
+	      if (!best_cand)
+		continue;
+
+	      tree comp = get_computation_at (data->current_loop,
+					      &dummy_use, best_cand,
+					      SSA_NAME_DEF_STMT (def));
+	      if (!comp)
+		continue;
+
+	      if (count > 1)
+		{
+		  tree vexpr = make_node (DEBUG_EXPR_DECL);
+		  DECL_ARTIFICIAL (vexpr) = 1;
+		  TREE_TYPE (vexpr) = TREE_TYPE (comp);
+		  if (SSA_NAME_VAR (def))
+		    DECL_MODE (vexpr) = DECL_MODE (SSA_NAME_VAR (def));
+		  else
+		    DECL_MODE (vexpr) = TYPE_MODE (TREE_TYPE (vexpr));
+		  gimple def_temp = gimple_build_debug_bind (vexpr, comp, NULL);
+		  gimple_stmt_iterator gsi;
+
+		  if (gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI)
+		    gsi = gsi_after_labels (gimple_bb
+					    (SSA_NAME_DEF_STMT (def)));
+		  else
+		    gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (def));
+
+		  gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
+		  comp = vexpr;
+		}
+
+	      FOR_EACH_IMM_USE_STMT (stmt, imm_iter, def)
+		{
+		  if (!gimple_debug_bind_p (stmt))
+		    continue;
+
+		  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
+		    SET_USE (use_p, comp);
+
+		  if (!comp)
+		    BREAK_FROM_IMM_USE_STMT (imm_iter);
+
+		  update_stmt (stmt);
+		}
+	    }
+	}
     }
 
   release_defs_bitset (toremove);
Defer rescan of debug insns emitted for debug temps for dead debug uses

From: Alexandre Oliva <lxol...@fsfla.org>

for  gcc/ChangeLog

	PR debug/54693
	* gcc/valtrack.c (dead_debug_insert_temp): Defer rescan of
	newly-emitted debug insn.
---

 gcc/valtrack.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/gcc/valtrack.c b/gcc/valtrack.c
index f6c0db4..8cc3269 100644
--- a/gcc/valtrack.c
+++ b/gcc/valtrack.c
@@ -688,7 +688,9 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno,
     bind = emit_debug_insn_after (bind, insn);
   else
     bind = emit_debug_insn_before (bind, insn);
-  df_insn_rescan (bind);
+  if (debug->to_rescan == NULL)
+    debug->to_rescan = BITMAP_ALLOC (NULL);
+  bitmap_set_bit (debug->to_rescan, INSN_UID (bind));
 
   /* Adjust all uses.  */
   while ((cur = uses))
@@ -699,8 +701,6 @@ dead_debug_insert_temp (struct dead_debug_local *debug, unsigned int uregno,
 	*DF_REF_REAL_LOC (cur->use)
 	  = gen_lowpart_SUBREG (GET_MODE (*DF_REF_REAL_LOC (cur->use)), dval);
       /* ??? Should we simplify subreg of subreg?  */
-      if (debug->to_rescan == NULL)
-	debug->to_rescan = BITMAP_ALLOC (NULL);
       bitmap_set_bit (debug->to_rescan, INSN_UID (DF_REF_INSN (cur->use)));
       uses = cur->next;
       XDELETE (cur);

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Reply via email to