On Thu, Oct 28, 2021 at 8:34 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 10/28/2021 9:24 AM, Aldy Hernandez wrote: > > This patch upgrades the pre-VRP threading passes to fully resolving > > backward threaders, and removes the post-VRP threading passes altogether. > > With it, we reduce the number of threaders in our pipeline from 9 to 7. > > > > This will leave DOM as the only forward threader client. When the ranger > > can handle floats, we should be able to upgrade the pre-DOM threaders to > > fully resolving threaders and kill the embedded DOM threader. > > > > The final numbers are: > > > > prev: # threads in backward + vrp-threaders = 92624 > > now: # threads in backward threaders = 94275 > > Gain: +1.78% > > > > prev: # total threads: 189495 > > now: # total threads: 193714 > > Gain: +2.22% > > > > The numbers are not as great as my initial proposal, but I've > > recently pushed all the work that got us to this point ;-). > > > > And... the total compilation improves by 1.32%! > > > > There's a regression on uninit-pred-7_a.c that I've yet to look at. I > > want to make sure it's not a missing thread. If it is, I'll create a PR > > and own it. > > > > Also, the tree-ssa/phi_on_compare-*.c tests have all regressed. This > > seems to be some special case the forward threader handles that the > > backward threader does not (edge_forwards_cmp_to_conditional_jump*). > > I haven't dug deep to see if this is solveable within our > > infrastructure, but a cursory look shows that even though the VRP > > threader threads this, the *.optimized dump ends with more conditional > > jumps than without the optimization. I'd like to punt on this for > > now, because DOM actually catches this through its lone use of the > > forward threader (I've adjusted the tests). However, we will need to > > address this sooner or later, if indeed it's still improving the final > > assembly. > > > > Even though we have been incrementally stressing all the pieces of this > > intricate puzzle, I do expect fall out. My plan from here until stage1 > > ends is to stop new development in the threader(s), and focus on bug > > fixing and improving the developer's debugging experience. > > > > OK pending another round of tests on x86-64 and ppc64le Linux? > > > > gcc/ChangeLog: > > > > * passes.def: Replace the pass_thread_jumps before VRP* with > > pass_thread_jumps_full. Remove all pass_vrp_threader instances. > > > > libgomp/ChangeLog: > > > > * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading > > changes. > > * testsuite/libgomp.graphite/force-parallel-8.c: Same. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/loop-unswitch-2.c: Adjust for threading changes. > > * gcc.dg/old-style-asm-1.c: Same. > > * gcc.dg/tree-ssa/phi_on_compare-1.c: Same. > > * gcc.dg/tree-ssa/phi_on_compare-2.c: Same. > > * gcc.dg/tree-ssa/phi_on_compare-3.c: Same. > > * gcc.dg/tree-ssa/phi_on_compare-4.c: Same. > > * gcc.dg/tree-ssa/pr20701.c: Same. > > * gcc.dg/tree-ssa/pr21001.c: Same. > > * gcc.dg/tree-ssa/pr21294.c: Same. > > * gcc.dg/tree-ssa/pr21417.c: Same. > > * gcc.dg/tree-ssa/pr21559.c: Same. > > * gcc.dg/tree-ssa/pr21563.c: Same. > > * gcc.dg/tree-ssa/pr49039.c: Same. > > * gcc.dg/tree-ssa/pr59597.c: Same. > > * gcc.dg/tree-ssa/pr61839_1.c: Same. > > * gcc.dg/tree-ssa/pr61839_3.c: Same. > > * gcc.dg/tree-ssa/pr66752-3.c: Same. > > * gcc.dg/tree-ssa/pr68198.c: Same. > > * gcc.dg/tree-ssa/pr77445-2.c: Same. > > * gcc.dg/tree-ssa/pr77445.c: Same. > > * gcc.dg/tree-ssa/ranger-threader-1.c: Same. > > * gcc.dg/tree-ssa/ranger-threader-2.c: Same. > > * gcc.dg/tree-ssa/ranger-threader-4.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same. > > * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. > > * gcc.dg/tree-ssa/ssa-thread-14.c: Same. > > * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same. > > * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same. > > * gcc.dg/tree-ssa/vrp02.c: Same. > > * gcc.dg/tree-ssa/vrp03.c: Same. > > * gcc.dg/tree-ssa/vrp05.c: Same. > > * gcc.dg/tree-ssa/vrp06.c: Same. > > * gcc.dg/tree-ssa/vrp07.c: Same. > > * gcc.dg/tree-ssa/vrp08.c: Same. > > * gcc.dg/tree-ssa/vrp09.c: Same. > > * gcc.dg/tree-ssa/vrp106.c: Same. > > * gcc.dg/tree-ssa/vrp33.c: Same. > OK. And yes, there will probably be fallout. Fully expected and we'll > deal with it.
Btw, in case the "fully resolving" mode is slower than not fully resolving please consider gating it on -fexpensive-optimizations (aka -O2+), thus run the passes in not fully resolving modes at-O1. Btw, there were quite a few big compile-time hogs with the vrp_threader passes, not sure if this solves those. Richard. > jeff >