Hi,

On 03/29, Richard Biener wrote:
> On Thu, 28 Mar 2019, Giuliano Belinassi wrote:
> 
> > Hi, Richard
> > 
> > On 03/28, Richard Biener wrote:
> > > On Wed, Mar 27, 2019 at 2:55 PM Giuliano Belinassi
> > > <giuliano.belina...@usp.br> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On 03/26, Richard Biener wrote:
> > > > > On Tue, 26 Mar 2019, David Malcolm wrote:
> > > > >
> > > > > > On Mon, 2019-03-25 at 19:51 -0400, nick wrote:
> > > > > > > Greetings All,
> > > > > > >
> > > > > > > I would like to take up parallelize compilation using threads or 
> > > > > > > make
> > > > > > > c++/c
> > > > > > > memory issues not automatically promote. I did ask about this 
> > > > > > > before
> > > > > > > but
> > > > > > > not get a reply. When someone replies I'm just a little concerned 
> > > > > > > as
> > > > > > > my writing for proposals has never been great so if someone just
> > > > > > > reviews
> > > > > > > and doubt checks that's fine.
> > > > > > >
> > > > > > > As for the other things building gcc and running the testsuite is
> > > > > > > fine. Plus
> > > > > > > I already working on gcc so I've pretty aware of most things and 
> > > > > > > this
> > > > > > > would
> > > > > > > be a great steeping stone into more serious gcc development work.
> > > > > > >
> > > > > > > If sample code is required that's in mainline gcc I sent out a 
> > > > > > > trial
> > > > > > > patch
> > > > > > > for this issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Nick
> > > > > >
> > > > > > It's good to see that you've gotten as far as attaching a patch to 
> > > > > > BZ
> > > > > > [1]
> > > > > >
> > > > > > I think someone was going to attempt the "parallelize compilation 
> > > > > > using
> > > > > > threads" idea last year, but then pulled out before the summer; you 
> > > > > > may
> > > > > > want to check the archives (or was that you?)
> > > > >
> > > > > There's also Giuliano Belinassi who is interested in the same project
> > > > > (CCed).
> > > >
> > > > Yes, I will apply for this project, and I will submit the final version
> > > > of my proposal by the end of the week.
> > > >
> > > > Currently, my target is the `expand_all_functions` routine, as most of
> > > > the time is spent on it according to the experiments that I performed as
> > > > part of my Master's research on compiler parallelization.
> > > > (-O2, --disable-checking)
> > > 
> > > Yes, more specifically I think the realistic target is the GIMPLE part
> > > of   execute_pass_list (cfun, g->get_passes ()->all_passes);  done in
> > > cgraph_node::expand.  If you look at passes.def you'll see all_passes
> > > also contains RTL expansion (pass_expand) and the RTL optimization
> > > queue (pass_rest_of_compilation).  The RTL part isn't a realistic target.
> > > Without changing the pass hierarchy the obvious part that can be
> > > handled would be the pass_all_optimizations pass sub-queue of
> > > all_passes since those are all passes that perform transforms on the
> > > GIMPLE IL where we have all functions in this state at the same time
> > > and where no interactions between the functions happen anymore
> > > and thus functions can be processed in parallel (as much as make
> > > processes individual translation units in parallel).
> > > 
> > 
> > Great. So if I understood correctly, I will need to split
> > cgraph_node::expand() into three parts: IPA, GIMPLE and RTL, and then
> > refactor `expand_all_functions` so that the loop
> > 
> >      for (i = new_order_pos - 1; i >= 0; i--)
> > 
> >  use these three functions, then partition
> > 
> >      g->get_passes()->all_passes
> > 
> > into get_passes()->gimple_passes and get_passes()->rtl_passes, so I
> > can run RTL after GIMPLE is finished, to finally start the
> > paralellization of per function GIMPLE passes.
> 
> Yes, it involves refactoring of the loop - you may notice that
> parts of the compilation pipeline are under control of the
> pass manager (passes.c) but some is still manually driven
> by symbol_table::compile.  Whether it's more convenient to
> get more control stuffed to the pass manager and perform the
> threading under its control (I'd say that would be the cleaner
> design) or to try do this in the current ad-hoc parts remains
> to be seen.  You can see symbol_table::compile hands over
> control to the pass manager multiple times, first ipa_passes ()
> then all_late_ipa_passes and finally the expand_all_functions code.
> 
> I guess it would simplify things if you'd split pass_all_passes
> in passes.def at pass_expand like so:
> 
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2fcd80e53a3..bb0453b36a7 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -403,11 +403,10 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_spectrev1);
>    NEXT_PASS (pass_warn_function_noreturn);
>    NEXT_PASS (pass_gen_hsail);
> +  TERMINATE_PASS_LIST (all_passes)
>  
> -  NEXT_PASS (pass_expand);
> -
> -  NEXT_PASS (pass_rest_of_compilation);
> -  PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation)
> +  INSERT_PASSES_AFTER (pass_rest_of_compilation)
> +      NEXT_PASS (pass_expand);
>        NEXT_PASS (pass_instantiate_virtual_regs);
>        NEXT_PASS (pass_into_cfg_layout_mode);
>        NEXT_PASS (pass_jump);
> @@ -505,6 +504,5 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_final);
>        POP_INSERT_PASSES ()
>        NEXT_PASS (pass_df_finish);
> -  POP_INSERT_PASSES ()
>    NEXT_PASS (pass_clean_state);
> -  TERMINATE_PASS_LIST (all_passes)
> +  TERMINATE_PASS_LIST (pass_rest_of_compilation)
> 
> where to make things "work" again w/o threading you'd invoke
> execute_pass_list (cfun, g->get_passes ()->pass_rest_of_compilation)
> right after the all_passes invocation in cgraph_node::expand.
> 
> You then can refactor things so the loop over the 'order' array
> is done twice, once over all_passes (the set you then parallelize)
> and once over pass_rest_of_compilation (which you can't parallelize
> because of being in RTL).
>

I managed to get it working today. However, I found an issue with the
statistics_fini_pass() and pass_init_dump_file(), which I had to
comment, and force a `return false` for every case, respectively. Then I
managed to compile some programs correctly with -O2. I have no idea why
yet, but I will keep searching. I've attached my patch here.



> The above patch needs more changes in pass manager code - a chance
> to dive into it a little since that's where you'd change code.
> 
> > > To simplify the taks further a useful constraint is to not have
> > > a single optimization pass executed multiple times at the same time
> > > (otherwise you have to look at pass specific global states as well),
> > > thus the parallel part could be coded in a way keeping per function
> > > the state of what pass to execute next and have a scheduler pick
> > > a function its next pass is "free", scheduling that to a fixed set of
> > > worker threads.  There's no dependences between functions
> > > for the scheduling but each pass has only one execution resource
> > > in the pipeline.  You can start processing an arbitrarily large number
> > > of functions but slow functions will keep others from advancing across
> > > the pass it executes on.
> > >
> > 
> > Something like a pipeline? That is certainly a start, but if one pass is
> > very slow wouldn't it bottleneck everything?
> 
> Yes, something like a pipeline.  It's true a slow pass would
> bottleneck things - as said, we can selectively make passes
> thread safe in such cases.
> 
> > > Passes could of course be individually marked as thread-safe
> > > (multiple instances execute concurrently).
> > > 
> > > Garbage collection is already in control of the pass manager which
> > > would also be the thread scheduler.  For GC the remaining issue
> > > is allocation which passes occasionally do.  Locking is the short
> > > term solution for GSoC I guess, long-term per-thread GC pools
> > > might be better (to not slow down non-threaded parts of the compiler).
> > > 
> > > Richard.
> > > 
> > > >
> > > > Thank you,
> > > > Giuliano.
> > > >
> > > >
> > > > >
> > > > > > IIRC Richard [CCed] was going to mentor, with me co-mentoring [2] - 
> > > > > > but
> > > > > > I don't know if he's still interested/able to spare the cycles.
> > > > >
> > > > > I've offered mentoring to Giuliano, so yes.
> > > > >
> > > > > > That said, the parallel compilation one strikes me as very 
> > > > > > ambitious;
> > > > > > it's not clear to me what could realistically be done as a GSoC
> > > > > > project.  I think a good proposal on that would come up with some
> > > > > > subset of the problem that's doable over a summer, whilst also being
> > > > > > useful to the project.  The RTL infrastructure has a lot of global
> > > > > > state, so maybe either focus on the gimple passes, or on fixing 
> > > > > > global
> > > > > > state on the RTL side?  (I'm not sure)
> > > > >
> > > > > That was the original intent for the experiment.  There's also
> > > > > the already somewhat parallel WPA stage in LTO compilation mode
> > > > > (but it simply forks for the sake of simplicity...).
> > > > >
> > > > > > Or maybe a project to be more
> > > > > > explicit about regions of the code that assume that the garbage-
> > > > > > collector can't run within them?[3] (since the GC is state that 
> > > > > > would
> > > > > > be shared by the threads).
> > > > >
> > > > > The GC will be one obstackle.  The original idea was to drive
> > > > > parallelization on the pass level by the pass manager for the
> > > > > GIMPLE passes, so serialization points would be in it.
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Hope this is constructive/helpful
> > > > > > Dave
> > > > > >
> > > > > > [1] though typically our workflow involved sending patches to the 
> > > > > > gcc-
> > > > > > patches mailing list
> > > > > > [2] as libgccjit maintainer I have an interest in global state 
> > > > > > within
> > > > > > the compiler
> > > > > > [3] I posted some ideas about this back in 2013 IIRC; probably
> > > > > > massively bit-rotted since then.  I also gave a talk at Cauldron 
> > > > > > 2013
> > > > > > about global state in the compiler (with a view to gcc-as-a-shared-
> > > > > > library); likewise I expect much of the ideas there to be 
> > > > > > out-of-date);
> > > > > > for libgccjit I went with a different approach
> > 
> > Thank you,
> > Giuliano.
> > 
> 
> -- 
> Richard Biener <rguent...@suse.de>
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Index: gcc/cgraphunit.c
===================================================================
--- gcc/cgraphunit.c	(revision 270905)
+++ gcc/cgraphunit.c	(working copy)
@@ -2196,6 +2196,7 @@
   invoke_plugin_callbacks (PLUGIN_ALL_PASSES_START, NULL);
 
   execute_pass_list (cfun, g->get_passes ()->all_passes);
+  execute_pass_list (cfun, g->get_passes ()->get_rest_of_compilation ());
 
   /* Signal the end of passes.  */
   invoke_plugin_callbacks (PLUGIN_ALL_PASSES_END, NULL);
Index: gcc/pass_manager.h
===================================================================
--- gcc/pass_manager.h	(revision 270905)
+++ gcc/pass_manager.h	(working copy)
@@ -81,8 +81,9 @@
 
   opt_pass *get_rest_of_compilation () const
   {
-    return pass_rest_of_compilation_1;
+    return pass_rest_of_compilation;
   }
+
   opt_pass *get_clean_slate () const { return pass_clean_state_1; }
 
 public:
@@ -92,6 +93,7 @@
   opt_pass *all_lowering_passes;
   opt_pass *all_regular_ipa_passes;
   opt_pass *all_late_ipa_passes;
+  opt_pass *pass_rest_of_compilation;
 
   /* A map from static pass id to optimization pass.  */
   opt_pass **passes_by_id;
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 270905)
+++ gcc/passes.c	(working copy)
@@ -958,6 +958,7 @@
   dump_pass_list (all_regular_ipa_passes, 1);
   dump_pass_list (all_late_ipa_passes, 1);
   dump_pass_list (all_passes, 1);
+  //dump_pass_list (pass_rest_of_compilation, 1);
 
   pop_dummy_function ();
 }
@@ -1468,6 +1469,8 @@
     success |= position_pass (pass_info, &all_late_ipa_passes);
   if (!success || all_instances)
     success |= position_pass (pass_info, &all_passes);
+  if (!success || all_instances)
+    success |= position_pass (pass_info, &pass_rest_of_compilation);
   if (!success)
     fatal_error
       (input_location,
@@ -1516,8 +1519,8 @@
 
 pass_manager::pass_manager (context *ctxt)
 : all_passes (NULL), all_small_ipa_passes (NULL), all_lowering_passes (NULL),
-  all_regular_ipa_passes (NULL),
-  all_late_ipa_passes (NULL), passes_by_id (NULL), passes_by_id_size (0),
+  all_regular_ipa_passes (NULL),  all_late_ipa_passes (NULL), 
+  pass_rest_of_compilation(NULL), passes_by_id (NULL), passes_by_id_size (0),
   m_ctxt (ctxt), m_name_to_pass_map (NULL)
 {
   opt_pass **p;
@@ -2011,7 +2014,7 @@
       && need_ssa_update_p (cfun))
     gcc_assert (flags & TODO_update_ssa_any);
 
-  statistics_fini_pass ();
+  //statistics_fini_pass ();
 
   if (flags)
     do_per_function (execute_function_todo, (void *)(size_t) flags);
@@ -2091,9 +2094,15 @@
 bool
 pass_init_dump_file (opt_pass *pass)
 {
+  return false;
   /* If a dump file name is present, open it if enabled.  */
   if (pass->static_pass_number != -1)
     {
+      if (pass->static_pass_number < 0)
+      {
+	fprintf(stderr, "pass_name: %s, pass->static_pass_number: %d\n", pass->name, 
+	       pass->static_pass_number);
+      }
       timevar_push (TV_DUMP);
       gcc::dump_manager *dumps = g->get_dumps ();
       bool initializing_dump =
Index: gcc/passes.def
===================================================================
--- gcc/passes.def	(revision 270905)
+++ gcc/passes.def	(working copy)
@@ -395,11 +395,13 @@
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
+  TERMINATE_PASS_LIST(all_passes)
 
-  NEXT_PASS (pass_expand);
 
-  NEXT_PASS (pass_rest_of_compilation);
-  PUSH_INSERT_PASSES_WITHIN (pass_rest_of_compilation)
+
+
+  INSERT_PASSES_AFTER (pass_rest_of_compilation)
+      NEXT_PASS (pass_expand);
       NEXT_PASS (pass_instantiate_virtual_regs);
       NEXT_PASS (pass_into_cfg_layout_mode);
       NEXT_PASS (pass_jump);
@@ -497,6 +499,5 @@
 	  NEXT_PASS (pass_final);
       POP_INSERT_PASSES ()
       NEXT_PASS (pass_df_finish);
-  POP_INSERT_PASSES ()
   NEXT_PASS (pass_clean_state);
-  TERMINATE_PASS_LIST (all_passes)
+  TERMINATE_PASS_LIST (pass_rest_of_compilation)

Reply via email to