On Thu, Dec 15, 2016 at 12:33 PM, James Greenhalgh <james.greenha...@arm.com> wrote: > > Hi, > > As mentioned in PR77445, the improvements to the jump threading cost model > this year have caused substantial regressions in the amount of jump threading > we do and the performance of workloads which rely on that threading. > > This patch represents the low-bar in fixing the performance issues reported > in PR77445 - by weakening the cost model enough that we thread in a way much > closer to GCC 6. I don't think this patch is likely to be acceptable for > trunk, but I'm posting it for consideration regardless. > > Under the new cost model, if the edge doesn't pass optimize_edge_for_speed_p, > then we don't thread. The problem in late threading is bad edge profile > data makes the edge look cold, and thus it fails optimize_edge_for_speed_p > and is no longer considered a candidate for threading. As an aside, I think > this is the wrong cost model for jump threading, where you get the most > impact if you can resolve unpredictable switch statements - which by their > nature may have multiple cold edges in need of threading. > > Early threading should avoid these issues, as there is no edge profile > info yet. optimize_edge_for_speed_p is therefore more likely to hold, but > the condition for threading is: > > if (speed_p && optimize_edge_for_speed_p (taken_edge)) > { > if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) > { > [...reject threading...] > } > } > else if (n_insns > 1) > { > [...reject threading...] > } > > With speed_p is hardwired to false for the early threader > ( pass_early_thread_jumps::execute ): > > find_jump_threads_backwards (bb, false); > > So we always fall to the n_insns > 1 case and thus only rarely get to > thread. > > In this patch I change that call in pass_early_thread_jumps::execute to > instead look at optimize_bb_for_speed_p (bb) . That allows the speed_p > check to pass in the main threading cost model, and then the > optimize_edge_for_speed_p can also pass. That gets the first stage of > jump-threading back working in a proprietary benchmark which is sensitive > to this optimisation.
But all early optimizations are supposed to never increase code size -- they exist to get more realistic sizes to the IPA inliner heuristic code. Iff at all then add some --param early-jump-threading-insns and use that in place of the '1' and then eventually bump that to 2 or 3. But using the 100 from the late threading is too much. Note that using optimize_bb_for_speed_p at this point in the pipeline is nonsense and you can as well use ! optimize_size ... because even a guessed profile is only created at the end of the early optimization pipeline. > To get the rest of the required jump threading, I also have to weaken the > cost model - and this is obviously a hack! The easy hack is to special case > when the taken edge has frequency zero, and permit jump threading there. > > I know this patch is likely not the preferred way to fix this. For me that > would be a change to the cost model, which as I mentioned above I think > misses the point about which edges we want to thread. By far the best > fix would be to the junk edge profiling data we create during threading. > > However, this patch does fix the performance issues identified in PR77445, > and does highlight a fundamental issue with the early threader (which > doesn't seem to me like it will be effective while it sets speed_p to > false), so I'd like it to be considered for trunk if no better fix appears > before stage 4. But the PR has nothing to do with early threading but with late threading fed a bogus profile. > Bootstrapped on x86_64 with no issues. The testsuite changes just reshuffle > which passes spot the threading opportunities. > > OK? I don't think so. Richard. > Thanks, > James > > --- > gcc/ > > 2016-12-15 James Greenhalgh <james.greenha...@arm.com> > > PR tree-optimization/77445 > * tree-ssa-threadbackward.c (profitable_jump_thread_path) Work > around sometimes corrupt edge frequency data. > (pass_early_thread_jumps::execute): Pass > optimize_bb_for_speed_p as the speed_p parameter to > find_jump_threads_backwards to enable threading in more cases. > > gcc/testsuite/ > > 2016-12-15 James Greenhalgh <james.greenha...@arm.com> > > PR tree-optimization/77445 > * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust options and dump passes. > * gcc.dg/tree-ssa/pr66752-3.c: Likewise. >