This fixes the slowness seen in LCM compute_available accounted
to RTL cprop.  Currently the dataflow problem uses a "random"
basic-block order to seed the initial worklist (it wants to
visit predecessors before successors) - the following patch
makes it use inverted postorder (similar to tree PRE antic
computation).

This reduces the compile-time for the testcase in PR59802 at -O3
from

 CPROP                   :  54.53 (55%) usr   0.04 ( 6%) sys  54.57 (55%) 
wall    4444 kB ( 2%) ggc
 PRE                     :   4.47 ( 5%) usr   0.03 ( 5%) sys   4.48 ( 5%) 
wall    1833 kB ( 1%) ggc
 TOTAL                 :  98.51             0.63            99.13             
220195 kB

to

 CPROP                   :   2.13 ( 5%) usr   0.06 (10%) sys   2.20 ( 5%) 
wall    4444 kB ( 2%) ggc
 PRE                     :   0.52 ( 1%) usr   0.02 ( 3%) sys   0.54 ( 1%) 
wall    1833 kB ( 1%) ggc
 TOTAL                 :  42.22             0.60            42.81             
220195 kB

which is nice.  I checked for other compile-time hog PRs with
CPROP but didn't find one, have yet to check for PRE (three-letter,
likely too much noise).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk
(and branch?)

Thanks,
Richard.

2014-01-14  Richard Biener  <rguent...@suse.de>

        PR rtl-optimization/59802
        * lcm.c (compute_available): Use inverted postorder to seed
        the initial worklist.

Index: gcc/lcm.c
===================================================================
*** gcc/lcm.c   (revision 206599)
--- gcc/lcm.c   (working copy)
*************** compute_available (sbitmap *avloc, sbitm
*** 496,507 ****
    bitmap_vector_ones (avout, last_basic_block_for_fn (cfun));
  
    /* Put every block on the worklist; this is necessary because of the
!      optimistic initialization of AVOUT above.  */
!   FOR_EACH_BB_FN (bb, cfun)
      {
        *qin++ = bb;
        bb->aux = bb;
      }
  
    qin = worklist;
    qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS];
--- 496,515 ----
    bitmap_vector_ones (avout, last_basic_block_for_fn (cfun));
  
    /* Put every block on the worklist; this is necessary because of the
!      optimistic initialization of AVOUT above.  Use inverted postorder
!      to make the dataflow problem require less iterations.  */
!   int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
!   int postorder_num = inverted_post_order_compute (postorder);
!   for (int i = 0; i < postorder_num; ++i)
      {
+       bb = BASIC_BLOCK_FOR_FN (cfun, postorder[i]);
+       if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
+         || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun))
+       continue;
        *qin++ = bb;
        bb->aux = bb;
      }
+   free (postorder);
  
    qin = worklist;
    qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS];

Reply via email to