https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90037
--- Comment #7 from Jeffrey A. Law <law at redhat dot com> --- Making some progress here. Working with a reduced testcase we have the following key blocks as we enter DOM: ;; basic block 3, loop depth 0 ;; pred: 2 __builtin_strdup (spec_22(D)); _56 = separator_21(D) - spec_22(D); ulen_57 = (size_t) _56; _58 = ulen_57 + 1; u_60 = __builtin_malloc (_58); goto <bb 14>; [100.00%] ;; succ: 14 ;; basic block 4, loop depth 0 ;; pred: 15 iftmp.0_27 = separator_21(D) + 1; goto <bb 14>; [100.00%] ;; succ: 14 ;; basic block 5, loop depth 0 ;; pred: 14 # error_msg_40 = PHI <0B(14)> # iftmp.0_50 = PHI <iftmp.0_54(14)> endpwent (); _51 = iftmp.0_50 != 0B; _52 = error_msg_40 == 0B; _53 = _51 & _52; if (_53 != 0) goto <bb 7>; [49.98%] else goto <bb 10>; [50.02%] ;; succ: 7 ;; 10 ;; basic block 6, loop depth 0 ;; pred: 13 # error_msg_11 = PHI <"invalid spec"(13)> # iftmp.0_67 = PHI <0B(13)> endpwent (); _7 = iftmp.0_67 != 0B; _8 = error_msg_11 == 0B; _9 = _7 & _8; if (_9 != 0) goto <bb 7>; [49.98%] else goto <bb 9>; [50.02%] ;; succ: 7 ;; 9 ;; basic block 7, loop depth 0 ;; pred: 6 ;; 12 ;; 5 # iftmp.0_68 = PHI <iftmp.0_67(6), iftmp.0_69(12), iftmp.0_50(5)> _10 = *iftmp.0_68; if (_10 != 43) goto <bb 8>; [48.88%] else goto <bb 10>; [51.12%] ;; succ: 8 ;; 10 [ ... ] ;; basic block 12, loop depth 0 ;; pred: 14 ;; 13 # iftmp.0_69 = PHI <iftmp.0_54(14), 0B(13)> _44 = iftmp.0_69 != 0B; _45 = 1; _46 = _44 & _45; if (_46 != 0) goto <bb 7>; [100.00%] else goto <bb 10>; [0.00%] ;; succ: 7 ;; 10 [ ... ] ;; basic block 14, loop depth 0 ;; pred: 3 ;; 4 # iftmp.0_54 = PHI <0B(3), iftmp.0_27(4)> # u_74 = PHI <u_60(3), u_66(4)> if (u_74 != 0B) goto <bb 5>; [76.89%] else goto <bb 12>; [23.11%] ;; succ: 5 ;; 12 We find some jump threading opportunities. What's most interesting to note is the thread of control bb3->bb14->bb5->bb7. This is an infeasible path -- it can't happen at runtime. But we also can't simplify it in DOM. We need jump threading & cleanup_cfg to simplify the CFG. After DOM's jump threading pass is complete, the CFG is cleaned up and we've incrementally updated the SSA form we'll have: ;; basic block 3, loop depth 0 ;; pred: 2 __builtin_strdup (spec_22(D)); _41 = (long int) spec_22(D); _56 = -_41; ulen_57 = (size_t) _56; _58 = ulen_57 + 1; u_60 = __builtin_malloc (_58); if (u_60 != 0B) goto <bb 5>; [100.00%] else goto <bb 16>; [0.00%] ;; succ: 5 ;; 16 [ First note that the test originally from bb14 has been moved up into bb3 ] ;; basic block 5, loop depth 0 ;; pred: 3 # error_msg_40 = PHI <0B(3)> # iftmp.0_50 = PHI <0B(3)> endpwent (); _51 = 0; _52 = 1; _53 = _51; if (_51 != 0) goto <bb 7>; [0.00%] else goto <bb 9>; [100.00%] ;; succ: 7 ;; 9 The combination of threading and cfgcleanup has exposed the constants. But DOM is complete at this point and thus we don't get another change to optimize the block. And bb7 will look like: ;; basic block 7, loop depth 0 ;; pred: 15 ;; 11 ;; 5 # iftmp.0_68 = PHI <iftmp.0_27(15), iftmp.0_69(11), 0B(5)> _10 = *iftmp.0_68; if (_10 != 43) goto <bb 8>; [48.88%] else goto <bb 9>; [51.12%] ;; succ: 8 ;; 9 Notice how the value 0 flows in from bb5 for iftmp.0_68 which we then dereference. It's still an infeasible path, but without another DOM pass, predicate analysis or something similar, we're going to issue the false positive. So we already know that iterated DOM/jump threading is too expensive. We could move the path isolation bits later -- like after dom3 which would fix this regression, but probably introduce others. Ultimately it's the usual phase ordering problem. As I've noted in other BZs, I've never come up with an algorithm that would allow us to do some kind of worklist approach to replicate iterated dom+threading. But I have speculated that we could get most of the benefit by identifying a subset of blocks and running a local (block only) simplifier. On my whiteboard one of the filters to identify the subset of blocks would be blocks with degenerate PHIs. bb5 in this case would qualify and we'd see that bb5's conditional is always false and thus the edge bb5->bb7 can be deleted which would in turn eliminate the false positive. One of the questions that came up for me was whether or not phi-only cprop would help here. It doesn't. It only looks at degenerate PHIs to build its worklist of SSA_NAMEs to try and propagate away. Look at bb5. While there are degenerate PHIs, they do not feed anything useful. We'd actually have to run a more thorough constant propagation. Something to ponder. Anyway, wanted to get my thoughts recorded so I don't forget anything.