On 05/13, Richard Biener wrote:
> On Sun, 12 May 2019, Giuliano Belinassi wrote:
> 
> > Hi, Richard
> > 
> > On 05/07, Richard Biener wrote:
> > > On Mon, 6 May 2019, Giuliano Belinassi wrote:
> > > 
> > > > Hi,
> > > > 
> > > > On 03/29, Richard Biener wrote:
> > > > > On Thu, 28 Mar 2019, Giuliano Belinassi wrote:
> > > > > 
> > > > > > Hi, Richard
> > > > > > 
> > > > > > On 03/28, Richard Biener wrote:
> > > > > > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > > > > > > <giuliano.belina...@usp.br> wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > On 03/26, Richard Biener wrote:
> > > > > > > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > > > > > > >
> > > > > > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > > > > > > Greetings All,
> > > > > > > > > > >
> > > > > > > > > > > I would like to take up parallelize compilation using 
> > > > > > > > > > > threads or make
> > > > > > > > > > > c++/c
> > > > > > > > > > > memory issues not automatically promote. I did ask about 
> > > > > > > > > > > this before
> > > > > > > > > > > but
> > > > > > > > > > > not get a reply. When someone replies I'm just a little 
> > > > > > > > > > > concerned as
> > > > > > > > > > > my writing for proposals has never been great so if 
> > > > > > > > > > > someone just
> > > > > > > > > > > reviews
> > > > > > > > > > > and doubt checks that's fine.
> > > > > > > > > > >
> > > > > > > > > > > As for the other things building gcc and running the 
> > > > > > > > > > > testsuite is
> > > > > > > > > > > fine. Plus
> > > > > > > > > > > I already working on gcc so I've pretty aware of most 
> > > > > > > > > > > things and this
> > > > > > > > > > > would
> > > > > > > > > > > be a great steeping stone into more serious gcc 
> > > > > > > > > > > development work.
> > > > > > > > > > >
> > > > > > > > > > > If sample code is required that's in mainline gcc I sent 
> > > > > > > > > > > out a trial
> > > > > > > > > > > patch
> > > > > > > > > > > for this issue: 
> > > > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > >
> > > > > > > > > > > Nick
> > > > > > > > > >
> > > > > > > > > > It's good to see that you've gotten as far as attaching a 
> > > > > > > > > > patch to BZ
> > > > > > > > > > [1]
> > > > > > > > > >
> > > > > > > > > > I think someone was going to attempt the "parallelize 
> > > > > > > > > > compilation using
> > > > > > > > > > threads" idea last year, but then pulled out before the 
> > > > > > > > > > summer; you may
> > > > > > > > > > want to check the archives (or was that you?)
> > > > > > > > >
> > > > > > > > > There's also Giuliano Belinassi who is interested in the same 
> > > > > > > > > project
> > > > > > > > > (CCed).
> > > > > > > >
> > > > > > > > Yes, I will apply for this project, and I will submit the final 
> > > > > > > > version
> > > > > > > > of my proposal by the end of the week.
> > > > > > > >
> > > > > > > > Currently, my target is the `expand_all_functions` routine, as 
> > > > > > > > most of
> > > > > > > > the time is spent on it according to the experiments that I 
> > > > > > > > performed as
> > > > > > > > part of my Master's research on compiler parallelization.
> > > > > > > > (-O2, --disable-checking)
> > > > > > > 
> > > > > > > Yes, more specifically I think the realistic target is the GIMPLE 
> > > > > > > part
> > > > > > > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  
> > > > > > > done in
> > > > > > > cgraph_node::expand.  If you look at passes.def you'll see 
> > > > > > > all_passes
> > > > > > > also contains RTL expansion (pass_expand) and the RTL optimization
> > > > > > > queue (pass_rest_of_compilation).  The RTL part isn't a realistic 
> > > > > > > target.
> > > > > > > Without changing the pass hierarchy the obvious part that can be
> > > > > > > handled would be the pass_all_optimizations pass sub-queue of
> > > > > > > all_passes since those are all passes that perform transforms on 
> > > > > > > the
> > > > > > > GIMPLE IL where we have all functions in this state at the same 
> > > > > > > time
> > > > > > > and where no interactions between the functions happen anymore
> > > > > > > and thus functions can be processed in parallel (as much as make
> > > > > > > processes individual translation units in parallel).
> > > > > > > 
> > > > > > 
> > > > > > Great. So if I understood correctly, I will need to split
> > > > > > cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and 
> > > > > > then
> > > > > > refactor `expand_all_functions` so that the loop
> > > > > > 
> > > > > >      for (i = new_order_pos - 1; i >= 0; i--)
> > > > > > 
> > > > > >  use these three functions, then partition
> > > > > > 
> > > > > >      g->get_passes()->all_passes
> > > > > > 
> > > > > > into get_passes()->gimple_passes and get_passes()->rtl_passes, so I
> > > > > > can run RTL after GIMPLE is finished, to finally start the
> > > > > > paralellization of per function GIMPLE passes.
> > > > > 
> > > > > Yes, it involves refactoring of the loop - you may notice that
> > > > > parts of the compilation pipeline are under control of the
> > > > > pass manager (passes.c) but some is still manually driven
> > > > > by symbol_table::compile.  Whether it's more convenient to
> > > > > get more control stuffed to the pass manager and perform the
> > > > > threading under its control (I'd say that would be the cleaner
> > > > > design) or to try do this in the current ad-hoc parts remains
> > > > > to be seen.  You can see symbol_table::compile hands over
> > > > > control to the pass manager multiple times, first ipa_passes ()
> > > > > then all_late_ipa_passes and finally the expand_all_functions code.
> > > > > 
> > > > > I guess it would simplify things if you'd split pass_all_passes
> > > > > in passes.def at pass_expand like so:
> > > > > 
> > > > > diff --git a/gcc/passes.def b/gcc/passes.def
> > > > > index 2fcd80e53a3..bb0453b36a7 100644
> > > > > --- a/gcc/passes.def
> > > > > +++ b/gcc/passes.def
> > > > > @@ -403,11 +403,10 @@ along with GCC; see the file COPYING3.  If not 
> > > > > see
> > > > >    NEXT_PASS (pass_spectrev1);
> > > > >    NEXT_PASS (pass_warn_function_noreturn);
> > > > >    NEXT_PASS (pass_gen_hsail);
> > > > > +  TERMINATE_PASS_LIST (all_passes)
> > > > >  
> > > > > -  NEXT_PASS (pass_expand);
> > > > > -
> > > > > -  NEXT_PASS (pass_rest_of_compilation);
> > > > > -  PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation)
> > > > > +  INSERT_PASSES_AFTER (pass_rest_of_compilation)
> > > > > +      NEXT_PASS (pass_expand);
> > > > >        NEXT_PASS (pass_instantiate_virtual_regs);
> > > > >        NEXT_PASS (pass_into_cfg_layout_mode);
> > > > >        NEXT_PASS (pass_jump);
> > > > > @@ -505,6 +504,5 @@ along with GCC; see the file COPYING3.  If not see
> > > > >           NEXT_PASS (pass_final);
> > > > >        POP_INSERT_PASSES ()
> > > > >        NEXT_PASS (pass_df_finish);
> > > > > -  POP_INSERT_PASSES ()
> > > > >    NEXT_PASS (pass_clean_state);
> > > > > -  TERMINATE_PASS_LIST (all_passes)
> > > > > +  TERMINATE_PASS_LIST (pass_rest_of_compilation)
> > > > > 
> > > > > where to make things "work" again w/o threading you'd invoke
> > > > > execute_pass_list (cfun, g->get_passes ()->pass_rest_of_compilation)
> > > > > right after the all_passes invocation in cgraph_node::expand.
> > > > > 
> > > > > You then can refactor things so the loop over the 'order' array
> > > > > is done twice, once over all_passes (the set you then parallelize)
> > > > > and once over pass_rest_of_compilation (which you can't parallelize
> > > > > because of being in RTL).
> > > > >
> > > > 
> > > > I managed to get it working today. However, I found an issue with the
> > > > statistics_fini_pass() and pass_init_dump_file(), which I had to
> > > > comment, and force a `return false` for every case, respectively. Then I
> > > > managed to compile some programs correctly with -O2. I have no idea why
> > > > yet, but I will keep searching. I've attached my patch here.
> > > 
> > > It may be that you need to adjust the GCC_PASS_LISTS define in
> > > pass_manager.h, changing pass_rest_of_compilation to a pass list
> > > and also remove its "old" definition in passes.c.  Or it might
> > > be simpler to not re-use pass_rest_of_compilation but wrap
> > > the tail in a new all_passes2 or so.  You'll also see
> > > a call to register_dump_files (all_passes) in passes.c where
> > > you probably need to do the same for the new tail.
> > > 
> > > As usual grep is your best friend when figuring out what to do
> > > (doing that right now myself).
> > > 
> > > Richard.
> > 
> > I am just posting this as an update.
> > 
> > Just to contextualize, I've managed to fix this issue by following Biener's
> > advices. However what happened next was that the pr87600-2.c was
> > crashing the compiler, as discussed in IRC. The reason of the crash was
> > that after the discussed modifications, the `pass->next` after `pass_expand`
> > was not `pass_cleanup`, since the NEXT_PASS after `pass_expand` was
> > `pass_instantiate_virtual_regs`. I fixed that by not reusing the
> > `pass_rest_of_compilation`, but by creating a new pass called 
> > `all_rtl_passes`
> > and keeping `pass_expand` before the `PUSH_INSERT_PASSES_WITHIN`.
> > 
> > After that, I've run the full `dg.exp` testsuite and found that the
> > following tests are also failing:
> > 
> > FAIL: gcc.dg/20020919-1.c (internal compiler error)
> > FAIL: gcc.dg/always_inline3.c (internal compiler error)
> > FAIL: gcc.dg/dx-test.c (internal compiler error)
> > FAIL: gcc.dg/pr32370.c (internal compiler error)
> > FAIL: gcc.dg/pr61692.c (internal compiler error)
> > FAIL: gcc.dg/pr87600-2.c (internal compiler error)
> > FAIL: gcc.dg/va-arg-pack-1.c (internal compiler error)
> > FAIL: g++.dg/cpp0x/Wattributes1.C  -std=c++14  (test for warnings, line 125)
> > FAIL: g++.dg/cpp0x/Wattributes1.C  -std=c++17  (test for warnings, line 125)
> > FAIL: g++.dg/opt/pr69213.C  -std=gnu++98 (internal compiler error)
> > FAIL: g++.dg/opt/pr69213.C  -std=gnu++14 (internal compiler error)
> > FAIL: g++.dg/opt/pr69213.C  -std=gnu++17 (internal compiler error)
> > 
> > Notice that pr87600-2.c is failing again, but now when trying to compile the
> > `long test3 (void)` function. Interestingly, if I comment every function but
> > test3(), the test passes. All tests seems to be failed in the assert:
> > 
> >  gcc_assert (TREE_ASM_WRITTEN (decl))
> > 
> > of cgraphunit.c:2255, and the rtl dumps seems always to be different from
> > trunk.
> > 
> > I've attached a patch to my modifications, if anyone is interested in it.
> > I hope to provide a better feedback in the next days.
> 
> Btw, if you are familiar with git I encourage you to publish your
> changes on a branch.  Conveniently there's a github mirror of the
> GCC repository you can branch off.  It might be easier to work
> on a not-so-fast moving target which would mean branching off the
> gcc-9-branch for example (or just not rebase your branch too often).
> 

I am working on a mirror that I created:
https://gitlab.com/flusp/gcc/tree/giulianob_parallel

I also managed to fix all problems that I mentioned earlier. Basically
the check before calling the get_all_rtl_passes () was incorrect.

> Thanks,
> Richard.
> 
> > > 
> > > > 
> > > > 
> > > > > The above patch needs more changes in pass manager code - a chance
> > > > > to dive into it a little since that's where you'd change code.
> > > > > 
> > > > > > > To simplify the taks further a useful constraint is to not have
> > > > > > > a single optimization pass executed multiple times at the same 
> > > > > > > time
> > > > > > > (otherwise you have to look at pass specific global states as 
> > > > > > > well),
> > > > > > > thus the parallel part could be coded in a way keeping per 
> > > > > > > function
> > > > > > > the state of what pass to execute next and have a scheduler pick
> > > > > > > a function its next pass is "free", scheduling that to a fixed 
> > > > > > > set of
> > > > > > > worker threads.  There's no dependences between functions
> > > > > > > for the scheduling but each pass has only one execution resource
> > > > > > > in the pipeline.  You can start processing an arbitrarily large 
> > > > > > > number
> > > > > > > of functions but slow functions will keep others from advancing 
> > > > > > > across
> > > > > > > the pass it executes on.
> > > > > > >
> > > > > > 
> > > > > > Something like a pipeline? That is certainly a start, but if one 
> > > > > > pass is
> > > > > > very slow wouldn't it bottleneck everything?
> > > > > 
> > > > > Yes, something like a pipeline.  It's true a slow pass would
> > > > > bottleneck things - as said, we can selectively make passes
> > > > > thread safe in such cases.
> > > > > 
> > > > > > > Passes could of course be individually marked as thread-safe
> > > > > > > (multiple instances execute concurrently).
> > > > > > > 
> > > > > > > Garbage collection is already in control of the pass manager which
> > > > > > > would also be the thread scheduler.  For GC the remaining issue
> > > > > > > is allocation which passes occasionally do.  Locking is the short
> > > > > > > term solution for GSoC I guess, long-term per-thread GC pools
> > > > > > > might be better (to not slow down non-threaded parts of the 
> > > > > > > compiler).
> > > > > > > 
> > > > > > > Richard.
> > > > > > > 
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > > Giuliano.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > IIRC Richard [CCed] was going to mentor, with me 
> > > > > > > > > > co-mentoring [2] - but
> > > > > > > > > > I don't know if he's still interested/able to spare the 
> > > > > > > > > > cycles.
> > > > > > > > >
> > > > > > > > > I've offered mentoring to Giuliano, so yes.
> > > > > > > > >
> > > > > > > > > > That said, the parallel compilation one strikes me as very 
> > > > > > > > > > ambitious;
> > > > > > > > > > it's not clear to me what could realistically be done as a 
> > > > > > > > > > GSoC
> > > > > > > > > > project.  I think a good proposal on that would come up 
> > > > > > > > > > with some
> > > > > > > > > > subset of the problem that's doable over a summer, whilst 
> > > > > > > > > > also being
> > > > > > > > > > useful to the project.  The RTL infrastructure has a lot of 
> > > > > > > > > > global
> > > > > > > > > > state, so maybe either focus on the gimple passes, or on 
> > > > > > > > > > fixing global
> > > > > > > > > > state on the RTL side?  (I'm not sure)
> > > > > > > > >
> > > > > > > > > That was the original intent for the experiment.  There's also
> > > > > > > > > the already somewhat parallel WPA stage in LTO compilation 
> > > > > > > > > mode
> > > > > > > > > (but it simply forks for the sake of simplicity...).
> > > > > > > > >
> > > > > > > > > > Or maybe a project to be more
> > > > > > > > > > explicit about regions of the code that assume that the 
> > > > > > > > > > garbage-
> > > > > > > > > > collector can't run within them?[3] (since the GC is state 
> > > > > > > > > > that would
> > > > > > > > > > be shared by the threads).
> > > > > > > > >
> > > > > > > > > The GC will be one obstackle.  The original idea was to drive
> > > > > > > > > parallelization on the pass level by the pass manager for the
> > > > > > > > > GIMPLE passes, so serialization points would be in it.
> > > > > > > > >
> > > > > > > > > Richard.
> > > > > > > > >
> > > > > > > > > > Hope this is constructive/helpful
> > > > > > > > > > Dave
> > > > > > > > > >
> > > > > > > > > > [1] though typically our workflow involved sending patches 
> > > > > > > > > > to the gcc-
> > > > > > > > > > patches mailing list
> > > > > > > > > > [2] as libgccjit maintainer I have an interest in global 
> > > > > > > > > > state within
> > > > > > > > > > the compiler
> > > > > > > > > > [3] I posted some ideas about this back in 2013 IIRC; 
> > > > > > > > > > probably
> > > > > > > > > > massively bit-rotted since then.  I also gave a talk at 
> > > > > > > > > > Cauldron 2013
> > > > > > > > > > about global state in the compiler (with a view to 
> > > > > > > > > > gcc-as-a-shared-
> > > > > > > > > > library); likewise I expect much of the ideas there to be 
> > > > > > > > > > out-of-date);
> > > > > > > > > > for libgccjit I went with a different approach
> > > > > > 
> > > > > > Thank you,
> > > > > > Giuliano.
> > > > > > 
> > > > > 
> > > > > -- 
> > > > > Richard Biener <rguent...@suse.de>
> > > > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > > > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG 
> > > > > Nürnberg)
> > > > 
> > > > 
> > > 
> > > -- 
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
> > Giuliano.
> > 
> > 
> 
> -- 
> Richard Biener <rguent...@suse.de>
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Thank you,
Giuliano.

Reply via email to