http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59597
--- Comment #3 from Jeffrey A. Law <law at redhat dot com> --- This test is certainly exhibiting one of the problematical behaviours I was concerned about and had noted during the development of the FSM optimization code. Specifically, we might have a block where we can trivially thread all the incoming edges to specific outgoing edges. A great example might look like: ;; basic block 13, loop depth 2 ;; pred: 11 ;; 12 # crc_205 = PHI <crc_194(11), crc_204(12)> # carry_206 = PHI <0(11), 1(12)> crc_207 = crc_205 >> 1; if (carry_206 != 0) goto <bb 14>; else goto <bb 15>; ;; succ: 14 ;; 15 Of course, the threading code detects this and registers the appropriate jump threads. Where things go bad is BB13 may be on other jump threading paths. For example, this: Registering jump thread: (9, 11) incoming edge; (11, 13) joiner; (13, 15) normal; This will result in BB11 and BB13 being cloned. This doesn't really buy us anything. It's not clear if that's the cause of the slowdown or not, but it's clearly wasteful. Somewhere around here I've got some code to detect this kind of situation and do something more sensible. I've got to find it, update it and see if it improves things.