https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
Aldy Hernandez <aldyh at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amacleod at redhat dot com --- Comment #10 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- I tried all sorts of knobs limiting the behavior for large BBs (one function has over 20k blocks), a large number of imports (dependencies on the final conditional), and even the max number of blocks to look back. None of them made a difference. Then I realized that this PR was originally reported against the hybrid VRP threader, which used a different path discovery engine altogether (the old forward threader). So, the problem can't be in the backward threader path discovery bits, but in something the solver is doing. I timed all the threaders using the solver by functionality (simple versus fully resolving mode): backwards simple : 4.85 ( 2%) 0.00 ( 0%) 4.84 ( 2%) 932k ( 0%) backwards full : 54.60 ( 17%) 0.01 ( 1%) 54.70 ( 17%) 664k ( 0%) This confirms my hypothesis that it's not the backward threader discovery bits, since the above two entries use the same engine. So clearly, it's something that the fully resolving threader does that was common with the hybrid threader, i.e. our use of the ranger. A callgrind session shows that the majority of the back threader's time is being spent in: path_range_query::range_on_path_entry (irange &r, tree name) ...which is understandable, because when we can't resolve an SSA within the path, we ask the ranger what the range on entry to the path is. Curiously though, most of the time is spent in propagate_cache, especially add_to_update, which is accounting for 37.5% of the threader's time: - if (!bitmap_bit_p (m_propfail, bb->index) && !m_update_list.contains (bb)) - m_update_list.quick_push (bb); This is a large CFG, so a linear search of a BB, is bound to be slow. Just replacing it with an sbitmap knocks a good 12 seconds: backwards jump threading : 48.40 ( 28%) 0.02 ( 1%) 48.57 ( 27%) 1597k ( 0%) backwards jump threading : 32.96 ( 22%) 0.09 ( 4%) 33.12 ( 22%) 1499k ( 0%) Not ideal, but a good improvement IMO. I'll post my proposed patch, but I suspect Andrew may have other tricks up his sleeve.