On Mon, 22 Jul 2019, Richard Biener wrote: > On Sun, 21 Jul 2019, Giuliano Belinassi wrote: > > > Hi all, > > > > Here is my second evaluation report, together with a simple program that > > I was able to compile with my parallel version of GCC. Keep in mind that > > I still have lots of concurrent issues inside the compiler and therefore > > my branch will fail to compile pretty much anything else. > > > > To reproduce my current branch, use the following steps: > > > > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > > > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1. > > > > 3-) Compile with --disable-bootstrap --enable-languages=c > > > > 4-) make > > > > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance. > > > > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc" > > > > 7-) compile the program using -O2 > > > > I a attaching my report in markdown format, which you can convert to pdf > > using `pandoc` if you find it difficult to read in the current format. > > > > I am also open to suggestions. Please do not hesitate to comment :) > > Thanks for the report and it's great that you are making progress! > > I suggest you add a --param (edit params.def) so one can choose > num_threads on the command-line instead of needing to recompile GCC. > Just keep the default "safe" so that GCC build itself will still work. > > For most of the allocators I think that in the end we want to > keep most of them global but have either per-thread freelists > or a freelist implementation that can work (allocate and free) > without locking, employing some RCU scheme. Not introducing > per-thread state is probably leaner on the implementation. > It would of course mean taking a lock when the freelist needs to > be re-filled from the main pool but that's hopefully not common. > I don't know a RCU allocator freelist implementation to copy/learn > from, but experimenting with such before going the per thread freelist > might be interesting. Maybe not all allocators need to be treated > equal either. > > Your memory-block issue is likely that you added > > { > if (!instance) > instance = XNEW (memory_block_pool); > > but as misleading as it is, XNEW doesn't invoke C++ new but > just malloc so the allocated structure isn't initialized > since it's constructor isn't invoked. Just use > > instance = new memory_block_pool; > > with that I get helgrind to run (without complaining!) on your > testcase. I also get to compile gimple-match.c with two threads > for more than one minute before crashing on some EVRP global > state (somehow I knew the passes global state would be quite a > distraction...). > > I hope the project will be motivation to cleanup the way we > handle pass-specific global state.
Btw, to get to "working" state quicker you might consider concentrating on a pass subset for which you can conveniently restrict optimization to just -Og, effectively parallelizing pass_all_optimizations_g only, you then probably hit more issues in infrastructure which is more interesting for the project (we know there's a lot of pass-specific global state...). Of course the time spent in pass_all_optimizations_g is minimal... I then hit tree-ssa-live.c:usedvars quickly (slap __thread on it) and after that the EVRP issue via the sprintf_length pass. Richard.