https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79663

--- Comment #1 from amker at gcc dot gnu.org ---
Root cause understood.  After patch, combine_chains looks like:

  /* Process in reverse order so dominance point is ready when it comes
     to the root ref.  */
  for (i = ch1->refs.length (); i > 0; i--)
    {
      r1 = ch1->refs[i - 1];
      r2 = ch2->refs[i - 1];
      nw = XCNEW (struct dref_d);
      nw->distance = r1->distance;
      nw->stmt = stmt_combining_refs (r1, r2, i == 1 ? insert : NULL);

      /* Record dominance point where root combined stmt should be inserted
         for chains with 0 length.  Though all root refs dominate following
         refs, it's possible the combined stmt doesn't.  See PR70754.  */
      if (ch1->length == 0
          && (insert == NULL || stmt_dominates_stmt_p (nw->stmt, insert)))
        insert = nw->stmt;

      tmp_refs.safe_push (nw);
    }

  //...

  root_stmt = get_chain_root (new_chain)->stmt;
  for (i = 1; new_chain->refs.iterate (i, &nw); i++)
    {
      if (nw->distance == new_chain->length
          && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
        {
          new_chain->has_max_use_after = true;
          break;
        }
    }

Note the original PR70754 only happens for ZERO length chains, so I skipped
non-ZERO chains in the first for loop.  Problem with non-ZERO chains is
insertion point for refs of the chain could be the same statement, so
originally combined statements of the chain are inserted as:

   combined_stmt_for_ref0
   combined_stmt_for_ref1
   ...
   combined_stmt_for_refn
   insert_point

But now the first for-loop visits all refs in reversing order, the inserted
statements are like:

   combined_stmt_for_refn
   ...
   combined_stmt_for_ref1
   combined_stmt_for_ref0
   insert_point

Thus in the second for-loop, condition "!stmt_dominates_stmt_p" evaluates to
true, we set new_chain->has_max_use_after to true in this case.

In determine_unroll_factor, we have:

      /* The best unroll factor for this chain is equal to the number of
         temporary variables that we create for it.  */
      af = chain->length;
      if (chain->has_max_use_after)
        af++;

Causing unnecessary unrolling and the regression.

There are two possible fix. 
1) Separate fix of PR70754 for ZERO/non-ZERO chains.  kind of duplicate the
first loop for the two cases.
2) Simply remove (ch1->length == 0) condition in  the first loop, thus
root_stmt will be inserted in dominant point.  I am testing this one.

Reply via email to