https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114855

--- Comment #8 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Andrew Macleod from comment #7)
> LOoks like the primary culprits now are:
> 
> dominator optimization             : 666.73 (  7%)   0.77 (  2%) 671.76 ( 
> 7%)   170M (  4%)
> backwards jump threading           :7848.77 ( 85%)  21.04 ( 65%)7920.05 (
> 85%)  1332M ( 29%)
> 
> TOTAL                              :9250.99         32.58       9341.40     
> 4619M

If I turn off threading, then VRP opps up with 400, so I took a look at VRP.

The biggest problem is that this testcase has on the order of 400,000 basic
blocks, with a pattern of a block of code followed by a lot of CFG diamonds
using a number of differense ssa-names from within the block over and over.  
When we are calculating /storing imports and exports for every block, then
utilizing that info to try to find outgoing ranges that maybe we can use, it
simply adds up.

For VRP, we currently utilize different cache models depoending on the number
of block.. Im wondering if maybe this might not be a good testcase to actually
use a different VRP when the number of block are excessive.  I wroite the fast
VRP pass last year, which currently isnt being used.  I'm goign to experiment
with it to see if we turn it on for CFGs that above a threasdhold (100,000 BB?
), we enable the lower overhead fast VRP instead for all VRP passes. 

The threading issue probably needs to have some knobs added or tweaked for such
very large BBs. there would be a LOT of threading opportunities in the code I
saw, so I can see why it would be so busy.  I saw a lot fo branches to branches
using the same SSA_NAMe.

Reply via email to