Hi, Richard On 06/25, Richard Biener wrote: > On Mon, 24 Jun 2019, Giuliano Belinassi wrote: > > > Hi, > > > > Parallelize GCC with Threads -- First Evaluation > > > > Hi everyone, > > > > I am attaching the first evaluation report here publicly for gathering > > feedback. The file is in markdown format and it can be easily be converted > > to > > PDF for better visualization. > > > > I am also open to suggestions and ideas in order to improve the current > > project :-) > > > > My branch can be seen here: > > https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > Thanks for your work on this! > > I didn't think of the default_obstack and input_location issues > so it's good we get to know these. The bitmap default obstack > is merely a convenience and fortunately (content!) lifetime is constrained > to be per-pass though that's not fully enforced. Your solution is fine > I think and besides when parallelizing the IPA phase should be not > worse than making the default-obstack per thread context given how > many times we zap it.
This is something that is bothering me. In this current state, when I declare the default_obstack as TLS, several LTO tests fails in ssa_verify and I am not sure why yet. I will investigate futher in this. As for IPA phase, you mean the expand_ipa or the IPA itself? If expand_ipa falls in Inter Process otimization scheme the current strategy will work, of course. As for IPA itself, I am still not sure if it is parallelizable yet, however, I've seen some academic works parallelizing Dataflow Analysis, which can give interesting ideas about how to handle the call graphs (?). But this may be something interesting as the LTO WHOPR is entirely sequential AFIK. > > input_location shouldn't really be used... but oh well. > > As of per-pass global state I expect that eventually all > global variables in the individual passes are a problem > and in nearly all cases they are global out of laziness > to pass down state across functions. > > You identified some global state in infrastructure code > which are the more interesting cases, most relevant for > GIMPLE are the ones in tree-ssa-operands.c, tree-cfg.c > and et-forest.c I guess. > > For individual passes a first step would be to wrap > all globals into a struct we can allocate at ::execute > time and pass down as pointer. That's slightly less > intrusive than wrapping all of the pass in a class > but functionally equivalent. Sounds really good, but this will also require a lot of code change. Hopefully most of it can be solved without any unpleasant surprises. > > <<< > 1. The GCC `object_pool_allocator` > > There is also the GCC object_pool_allocator, which is used to allocate > some > objects. Since these objects may be used later in the compilation by > other > threads, we can't simply make them private to each thread. Therefore I > added a > threadsafe_object_pool_allocator object that currently uses locks > guarantee > safety, however I am not able to check its correctness. This is also > not efficient and might require a better approach later. > >>> > > I guess the same applies to the GC allocator - to make these > more efficient we'd have a per-thread freelist we can allocate > from without locking and which we'd, once empty, fill from the > main pool in larger chunks with locking. At thread finalization > we have to return the freelist to the main allocator then > and for the GC allocator possibly at garbage collection time. This also sounds good. If I understeand correctly, the sequential allocation state will be amortized, as we could ask for an object which is 2x bigger than each thread currently has. (I am assuming allocation is O(1)) > > This scheme may also work for the bitmap default obstack > and its freelist. I would also suggest when you run into > a specific issue with the default obstack to use a separate > obstack in the respective area. You mention issues with LTO, > are the reproducible on the branch? I suppose you are > currently testing GCC with num_threads = 1 to make sure you > are not introducing non-threading related issues? Yes. The issue with LTO is that obstack, as I mentioned earlier. It is reproducible in my branch, just compile it without any changes, then declare the obstack TLS, recompile the changes and run gcc.dg tests. > > Thanks again, > Richard. Giuliano.