Note that it still seems to me that the crossed_loop_header handling is overly conservative. We have:
@ -2771,6 +2771,7 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) bool seen_latch = false; int loops_crossed = 0; bool crossed_latch = false; + bool crossed_loop_header = false; // Use ->dest here instead of ->src to ignore the first block. The // first block is allowed to be in a different loop, since it'll be // redirected. See similar comment in profitable_path_p: "we don't @@ -2804,6 +2805,14 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) ++loops_crossed; } + // ?? Avoid threading through loop headers that remain in the + // loop, as such threadings tend to create sub-loops which + // _might_ be OK ??. + if (e->dest->loop_father->header == e->dest + && !flow_loop_nested_p (exit->dest->loop_father, + e->dest->loop_father)) + crossed_loop_header = true; + if (flag_checking && !m_backedge_threads) gcc_assert ((path[i]->e->flags & EDGE_DFS_BACK) == 0); } @@ -2829,6 +2838,21 @@ jt_path_registry::cancel_invalid_paths (vec<jump_thread_edge *> &path) cancel_thread (&path, "Path crosses loops"); return true; } + // The path should either start and end in the same loop or exit the + // loop it starts in but never enter a loop. This also catches + // creating irreducible loops, not only rotation. + if (entry->src->loop_father != exit->dest->loop_father + && !flow_loop_nested_p (exit->src->loop_father, + entry->dest->loop_father)) + { + cancel_thread (&path, "Path rotates loop"); + return true; + } + if (crossed_loop_header) + { + cancel_thread (&path, "Path crosses loop header but does not exit it"); + return true; + } return false; } If I read it correctly, for a path that enters the loop and later leaves it (where threading is desirable since we skip the whole loop) the logic above will still return true (after finishing the whole walk which seems like a waste). This may trigger more often at -Os since we limit loop header copying. And indeed, fixing profile updating would be nice. Why the updating code is not reused across different threaders? (I wrote several thread updating functions for varioius threaders introduced & remoed in the past and I wonder why we need to keep reinventing it)