On Thu, Oct 28, 2021 at 8:34 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 10/28/2021 9:24 AM, Aldy Hernandez wrote:
> > This patch upgrades the pre-VRP threading passes to fully resolving
> > backward threaders, and removes the post-VRP threading passes altogether.
> > With it, we reduce the number of threaders in our pipeline from 9 to 7.
> >
> > This will leave DOM as the only forward threader client.  When the ranger
> > can handle floats, we should be able to upgrade the pre-DOM threaders to
> > fully resolving threaders and kill the embedded DOM threader.
> >
> > The final numbers are:
> >
> >       prev: # threads in backward + vrp-threaders = 92624
> >       now:  # threads in backward threaders = 94275
> >       Gain: +1.78%
> >
> >       prev: # total threads: 189495
> >       now:  # total threads: 193714
> >       Gain: +2.22%
> >
> >       The numbers are not as great as my initial proposal, but I've
> >       recently pushed all the work that got us to this point ;-).
> >
> > And... the total compilation improves by 1.32%!
> >
> > There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
> > want to make sure it's not a missing thread.  If it is, I'll create a PR
> > and own it.
> >
> > Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
> > seems to be some special case the forward threader handles that the
> > backward threader does not (edge_forwards_cmp_to_conditional_jump*).
> > I haven't dug deep to see if this is solveable within our
> > infrastructure, but a cursory look shows that even though the VRP
> > threader threads this, the *.optimized dump ends with more conditional
> > jumps than without the optimization.  I'd like to punt on this for
> > now, because DOM actually catches this through its lone use of the
> > forward threader (I've adjusted the tests).  However, we will need to
> > address this sooner or later, if indeed it's still improving the final
> > assembly.
> >
> > Even though we have been incrementally stressing all the pieces of this
> > intricate puzzle, I do expect fall out.  My plan from here until stage1
> > ends is to stop new development in the threader(s), and focus on bug
> > fixing and improving the developer's debugging experience.
> >
> > OK pending another round of tests on x86-64 and ppc64le Linux?
> >
> > gcc/ChangeLog:
> >
> >       * passes.def: Replace the pass_thread_jumps before VRP* with
> >       pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
> >
> > libgomp/ChangeLog:
> >
> >       * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading 
> > changes.
> >       * testsuite/libgomp.graphite/force-parallel-8.c: Same.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
> >       * gcc.dg/old-style-asm-1.c: Same.
> >       * gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
> >       * gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
> >       * gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
> >       * gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
> >       * gcc.dg/tree-ssa/pr20701.c: Same.
> >       * gcc.dg/tree-ssa/pr21001.c: Same.
> >       * gcc.dg/tree-ssa/pr21294.c: Same.
> >       * gcc.dg/tree-ssa/pr21417.c: Same.
> >       * gcc.dg/tree-ssa/pr21559.c: Same.
> >       * gcc.dg/tree-ssa/pr21563.c: Same.
> >       * gcc.dg/tree-ssa/pr49039.c: Same.
> >       * gcc.dg/tree-ssa/pr59597.c: Same.
> >       * gcc.dg/tree-ssa/pr61839_1.c: Same.
> >       * gcc.dg/tree-ssa/pr61839_3.c: Same.
> >       * gcc.dg/tree-ssa/pr66752-3.c: Same.
> >       * gcc.dg/tree-ssa/pr68198.c: Same.
> >       * gcc.dg/tree-ssa/pr77445-2.c: Same.
> >       * gcc.dg/tree-ssa/pr77445.c: Same.
> >       * gcc.dg/tree-ssa/ranger-threader-1.c: Same.
> >       * gcc.dg/tree-ssa/ranger-threader-2.c: Same.
> >       * gcc.dg/tree-ssa/ranger-threader-4.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
> >       * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
> >       * gcc.dg/tree-ssa/ssa-thread-14.c: Same.
> >       * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
> >       * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
> >       * gcc.dg/tree-ssa/vrp02.c: Same.
> >       * gcc.dg/tree-ssa/vrp03.c: Same.
> >       * gcc.dg/tree-ssa/vrp05.c: Same.
> >       * gcc.dg/tree-ssa/vrp06.c: Same.
> >       * gcc.dg/tree-ssa/vrp07.c: Same.
> >       * gcc.dg/tree-ssa/vrp08.c: Same.
> >       * gcc.dg/tree-ssa/vrp09.c: Same.
> >       * gcc.dg/tree-ssa/vrp106.c: Same.
> >       * gcc.dg/tree-ssa/vrp33.c: Same.
> OK.  And yes, there will probably be fallout.  Fully expected and we'll
> deal with it.

Btw, in case the "fully resolving" mode is slower than not fully resolving
please consider gating it on -fexpensive-optimizations (aka -O2+), thus
run the passes in not fully resolving modes at-O1.

Btw, there were quite a few big compile-time hogs with the vrp_threader
passes, not sure if this solves those.

Richard.

> jeff
>

Reply via email to