This fixes the slowness seen in LCM compute_available accounted to RTL cprop. Currently the dataflow problem uses a "random" basic-block order to seed the initial worklist (it wants to visit predecessors before successors) - the following patch makes it use inverted postorder (similar to tree PRE antic computation).
This reduces the compile-time for the testcase in PR59802 at -O3 from CPROP : 54.53 (55%) usr 0.04 ( 6%) sys 54.57 (55%) wall 4444 kB ( 2%) ggc PRE : 4.47 ( 5%) usr 0.03 ( 5%) sys 4.48 ( 5%) wall 1833 kB ( 1%) ggc TOTAL : 98.51 0.63 99.13 220195 kB to CPROP : 2.13 ( 5%) usr 0.06 (10%) sys 2.20 ( 5%) wall 4444 kB ( 2%) ggc PRE : 0.52 ( 1%) usr 0.02 ( 3%) sys 0.54 ( 1%) wall 1833 kB ( 1%) ggc TOTAL : 42.22 0.60 42.81 220195 kB which is nice. I checked for other compile-time hog PRs with CPROP but didn't find one, have yet to check for PRE (three-letter, likely too much noise). Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk (and branch?) Thanks, Richard. 2014-01-14 Richard Biener <rguent...@suse.de> PR rtl-optimization/59802 * lcm.c (compute_available): Use inverted postorder to seed the initial worklist. Index: gcc/lcm.c =================================================================== *** gcc/lcm.c (revision 206599) --- gcc/lcm.c (working copy) *************** compute_available (sbitmap *avloc, sbitm *** 496,507 **** bitmap_vector_ones (avout, last_basic_block_for_fn (cfun)); /* Put every block on the worklist; this is necessary because of the ! optimistic initialization of AVOUT above. */ ! FOR_EACH_BB_FN (bb, cfun) { *qin++ = bb; bb->aux = bb; } qin = worklist; qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS]; --- 496,515 ---- bitmap_vector_ones (avout, last_basic_block_for_fn (cfun)); /* Put every block on the worklist; this is necessary because of the ! optimistic initialization of AVOUT above. Use inverted postorder ! to make the dataflow problem require less iterations. */ ! int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); ! int postorder_num = inverted_post_order_compute (postorder); ! for (int i = 0; i < postorder_num; ++i) { + bb = BASIC_BLOCK_FOR_FN (cfun, postorder[i]); + if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun) + || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) + continue; *qin++ = bb; bb->aux = bb; } + free (postorder); qin = worklist; qend = &worklist[n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS];