------- Comment #39 from rguenth at gcc dot gnu dot org 2008-01-10 15:01
-------
Hmm, looks I was compeltely wrong about the cause of the slowdown. We actually
run cfg_cleanup after cunroll and merge blocks like
<BB1>
...
<BB2>
# SFT.1_2 = PHI <SFT.1_1 (BB1)>
...
# SFT.1000_2 = PHI <SFT.1000_1 (BB1)>
# SFT.1_3 = VDEF <SFT.1_2>
...
# SFT.1000_3 = VDEF <SFT.1_2>
*mem = x;
and in merging the blocks we do (tree_merge_blocks):
/* Remove all single-valued PHI nodes from block B of the form
V_i = PHI <V_j> by propagating V_j to all the uses of V_i. */
for (phi = phi_nodes (b); phi; phi = phi_nodes (b))
{
...
replace_uses_by (def, use);
remove_phi_node (phi, NULL, true);
BUT! replace_uses_by will for _each_ phi-node we replace its uses update
the target stmt! And fold it! We can do better with VOPs
Preparing a patch.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34683