https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855
--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> --- (In reply to Andrew Macleod from comment #7) > LOoks like the primary culprits now are: > > dominator optimization : 666.73 ( 7%) 0.77 ( 2%) 671.76 ( > 7%) 170M ( 4%) > backwards jump threading :7848.77 ( 85%) 21.04 ( 65%)7920.05 ( > 85%) 1332M ( 29%) > > TOTAL :9250.99 32.58 9341.40 > 4619M If I turn off threading, then VRP opps up with 400, so I took a look at VRP. The biggest problem is that this testcase has on the order of 400,000 basic blocks, with a pattern of a block of code followed by a lot of CFG diamonds using a number of differense ssa-names from within the block over and over. When we are calculating /storing imports and exports for every block, then utilizing that info to try to find outgoing ranges that maybe we can use, it simply adds up. For VRP, we currently utilize different cache models depoending on the number of block.. Im wondering if maybe this might not be a good testcase to actually use a different VRP when the number of block are excessive. I wroite the fast VRP pass last year, which currently isnt being used. I'm goign to experiment with it to see if we turn it on for CFGs that above a threasdhold (100,000 BB? ), we enable the lower overhead fast VRP instead for all VRP passes. The threading issue probably needs to have some knobs added or tweaked for such very large BBs. there would be a LOT of threading opportunities in the code I saw, so I can see why it would be so busy. I saw a lot fo branches to branches using the same SSA_NAMe.