https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114480
--- Comment #31 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #21) > It is possible to reduce gcc_qsort workload by improving the presorted-ness > of the array, but of course avoiding quadratic behavior would be much better. > > With the following change, we go from > > 261,250,628,954 cycles:u > 533,040,964,437 instructions:u # 2.04 insn per cycle > 114,415,857,214 branches:u > 395,327,966 branch-misses:u # 0.35% of all branches > > to > > 256,620,517,403 cycles:u > 526,337,243,809 instructions:u # 2.05 insn per cycle > 113,447,583,099 branches:u > 383,121,251 branch-misses:u # 0.34% of all branches > > diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc > index d12a4a97f6..621793f7f4 100644 > --- a/gcc/tree-into-ssa.cc > +++ b/gcc/tree-into-ssa.cc > @@ -805,21 +805,22 @@ prune_unused_phi_nodes (bitmap phis, bitmap kills, > bitmap uses) > locate the nearest dominating def in logarithmic time by binary > search.*/ > bitmap_ior (to_remove, kills, phis); > n_defs = bitmap_count_bits (to_remove); > + adef = 2 * n_defs + 1; > defs = XNEWVEC (struct dom_dfsnum, 2 * n_defs + 1); > defs[0].bb_index = 1; > defs[0].dfs_num = 0; > - adef = 1; > + struct dom_dfsnum *head = defs + 1, *tail = defs + adef; > EXECUTE_IF_SET_IN_BITMAP (to_remove, 0, i, bi) > { > def_bb = BASIC_BLOCK_FOR_FN (cfun, i); > - defs[adef].bb_index = i; > - defs[adef].dfs_num = bb_dom_dfs_in (CDI_DOMINATORS, def_bb); > - defs[adef + 1].bb_index = i; > - defs[adef + 1].dfs_num = bb_dom_dfs_out (CDI_DOMINATORS, def_bb); > - adef += 2; > + head->bb_index = i; > + head->dfs_num = bb_dom_dfs_in (CDI_DOMINATORS, def_bb); > + head++, tail--; > + tail->bb_index = i; > + tail->dfs_num = bb_dom_dfs_out (CDI_DOMINATORS, def_bb); > } > + gcc_assert (head == tail); > BITMAP_FREE (to_remove); > - gcc_assert (adef == 2 * n_defs + 1); > qsort (defs, adef, sizeof (struct dom_dfsnum), cmp_dfsnum); > gcc_assert (defs[0].bb_index == 1); Very nice - now that we're back in stage 1 I think this is a good improvement and should be always a win? Can you test/submit it? I'll pre-approve it here.