On 03/10/2017 04:24 PM, Bernd Schmidt wrote:
In this PR, we have a few insns involved in two instruction combinations:

insn 16: set r100
insn 27: some calculation
insn 28: some calculation
insn 32: using r100
insn 33: using r100
insn 35: some calculation

Then we combine insns 27, 28 and 33, producing two output insns, As a
result, insn 28 (i2) now obtains a use of r100. But insn 32, which is
not involved in this combination, has the LOG_LINKS entry for that
register, and we don't know that we need to update it. As a result, the
second combination, involving regs 16, 32 and 35 (based on the remaining
LOG_LINK for r100), produces incorrect code, as we don't realize there's
a use of r100 before insn 32.

The following fixes it. Bootstrapped and tested on x86_64-linux, ok (on
all branches)?


Bernd


79910.diff


        PR rtl-optimization/79910
        * combine.c (record_used_regs): New static function.
        (try_combine): Handle situations where there is an additional
        instruction between I2 and I3 which needs to have a LOG_LINK
        updated.

        PR rtl-optimization/79910
        * gcc.dg/torture/pr79910.c: New test.
What a nasty little problem.

I don't like that we have to build these bitmaps due to the compile-time cost. Would it make sense to only build them when i0/i1 exist?

We don't do 4->3 combinations, just 4->2 and 3->2, so it's only the i2pattern where we might need to conjure up a LOG_LINK, right?


We don't do 4->3 combinations, right? So we only have to care about when there's going to be an newi2pat, right (3->2 or 4->2). We don't ever create a newi1pat (for a 4->3 combination), right? So we only have to worry about testing when there's a newi2pat, right?


Do you need to handle 4->3, in which case the new bits could be on newi1pat?

Index: gcc/combine.c
===================================================================
--- gcc/combine.c       (revision 245685)
+++ gcc/combine.c       (working copy)
@@ -2559,6 +2560,57 @@ can_split_parallel_of_n_reg_sets (rtx_in
   return true;
 }


@@ -4051,6 +4124,33 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
       return 0;
     }

+  auto_bitmap new_regs_in_i2;
+  if (newi2pat)
+    {
+      /* We need to discover situations where we introduce a use of a
+        register into I2, where none of the existing LOG_LINKS contain
+        a reference to it.  This can happen if previously I3 referenced
+        the reg, and there is an additional use between I2 and I3.  We
+        must remove the LOG_LINKS entry from that additional use and
+        distribute it along with our own ones.  */
+       note_uses (&newi2pat, record_used_regs, (bitmap)new_regs_in_i2);
+       bitmap_and_compl_into (new_regs_in_i2, i2_regset);
+       bitmap_and_compl_into (new_regs_in_i2, links_regset);
+
+       /* Here, we first look for situations where a hard register use
+          moved, and just give up.  This should happen approximately
+          never, and it's not worth it to deal with possibilities like
+          multi-word registers.  Later, when fixing up LOG_LINKS, we
+          deal with the case where a pseudo use moved.  */
+       if (prev_nonnote_insn (i3) != i2)
+         for (unsigned r = 0; r < FIRST_PSEUDO_REGISTER; r++)
+           if (bitmap_bit_p (new_regs_in_i2, r))
if (prev_nonnote_insn (i3) != i2
    && bitmap_first_set_bit (new_regs_in_i2) < FIRST_PSEUDO_REGISTER)

?



Overall it seems reasonable. If possible, let's avoid the calls to create the bitmaps in case where we know we'll never be creating a new use in i2.

I'm not wed to using bitmap_first_set_bit. It's clearer and may be faster since you're not doing a ton of calls into bitmap_bit_p. The bitmap_first_set_bit implementation under the hood finds the word within the first bitmap element that is nonzero, then uses ctzl on that to get its bit.

Jeff

Reply via email to