On Thu, Dec 15, 2016 at 12:33 PM, James Greenhalgh
<james.greenha...@arm.com> wrote:
>
> Hi,
>
> As mentioned in PR77445, the improvements to the jump threading cost model
> this year have caused substantial regressions in the amount of jump threading
> we do and the performance of workloads which rely on that threading.
>
> This patch represents the low-bar in fixing the performance issues reported
> in PR77445 - by weakening the cost model enough that we thread in a way much
> closer to GCC 6. I don't think this patch is likely to be acceptable for
> trunk, but I'm posting it for consideration regardless.
>
> Under the new cost model, if the edge doesn't pass optimize_edge_for_speed_p,
> then we don't thread. The problem in late threading is bad edge profile
> data makes the edge look cold, and thus it fails optimize_edge_for_speed_p
> and is no longer considered a candidate for threading. As an aside, I think
> this is the wrong cost model for jump threading, where you get the most
> impact if you can resolve unpredictable switch statements - which by their
> nature may have multiple cold edges in need of threading.
>
> Early threading should avoid these issues, as there is no edge profile
> info yet. optimize_edge_for_speed_p is therefore more likely to hold, but
> the condition for threading is:
>
>   if (speed_p && optimize_edge_for_speed_p (taken_edge))
>     {
>       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>         {
>           [...reject threading...]
>         }
>     }
>   else if (n_insns > 1)
>     {
>       [...reject threading...]
>     }
>
> With speed_p is hardwired to false for the early threader
> ( pass_early_thread_jumps::execute ):
>
>         find_jump_threads_backwards (bb, false);
>
> So we always fall to the n_insns > 1 case and thus only rarely get to
> thread.
>
> In this patch I change that call in pass_early_thread_jumps::execute to
> instead look at optimize_bb_for_speed_p (bb) . That allows the speed_p
> check to pass in the main threading cost model, and then the
> optimize_edge_for_speed_p can also pass. That gets the first stage of
> jump-threading back working in a proprietary benchmark which is sensitive
> to this optimisation.

But all early optimizations are supposed to never increase code size -- they
exist to get more realistic sizes to the IPA inliner heuristic code.

Iff at all then add some --param early-jump-threading-insns and use that
in place of the '1' and then eventually bump that to 2 or 3.  But using
the 100 from the late threading is too much.

Note that using optimize_bb_for_speed_p at this point in the pipeline
is nonsense and you can as well use ! optimize_size ...  because even
a guessed profile is only created at the end of the early optimization
pipeline.

> To get the rest of the required jump threading, I also have to weaken the
> cost model - and this is obviously a hack! The easy hack is to special case
> when the taken edge has frequency zero, and permit jump threading there.
>
> I know this patch is likely not the preferred way to fix this. For me that
> would be a change to the cost model, which as I mentioned above I think
> misses the point about which edges we want to thread. By far the best
> fix would be to the junk edge profiling data we create during threading.
>
> However, this patch does fix the performance issues identified in PR77445,
> and does highlight a fundamental issue with the early threader (which
> doesn't seem to me like it will be effective while it sets speed_p to
> false), so I'd like it to be considered for trunk if no better fix appears
> before stage 4.

But the PR has nothing to do with early threading but with late threading
fed a bogus profile.

> Bootstrapped on x86_64 with no issues. The testsuite changes just reshuffle
> which passes spot the threading opportunities.
>
> OK?

I don't think so.

Richard.

> Thanks,
> James
>
> ---
> gcc/
>
> 2016-12-15  James Greenhalgh  <james.greenha...@arm.com>
>
>         PR tree-optimization/77445
>         * tree-ssa-threadbackward.c (profitable_jump_thread_path) Work
>         around sometimes corrupt edge frequency data.
>         (pass_early_thread_jumps::execute): Pass
>         optimize_bb_for_speed_p as the speed_p parameter to
>         find_jump_threads_backwards to enable threading in more cases.
>
> gcc/testsuite/
>
> 2016-12-15  James Greenhalgh  <james.greenha...@arm.com>
>
>         PR tree-optimization/77445
>         * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust options and dump passes.
>         * gcc.dg/tree-ssa/pr66752-3.c: Likewise.
>

Reply via email to