On Mon, 22 Jul 2019, Richard Biener wrote:

> On Sun, 21 Jul 2019, Giuliano Belinassi wrote:
> 
> > Hi all,
> > 
> > Here is my second evaluation report, together with a simple program that
> > I was able to compile with my parallel version of GCC. Keep in mind that
> > I still have lots of concurrent issues inside the compiler and therefore
> > my branch will fail to compile pretty much anything else.
> > 
> > To reproduce my current branch, use the following steps:
> > 
> > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel
> > 
> > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1.
> > 
> > 3-) Compile with --disable-bootstrap --enable-languages=c
> > 
> > 4-) make
> > 
> > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance.
> > 
> > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc"
> > 
> > 7-) compile the program using -O2
> > 
> > I a attaching my report in markdown format, which you can convert to pdf
> > using `pandoc` if you find it difficult to read in the current format.
> > 
> > I am also open to suggestions. Please do not hesitate to comment :)
> 
> Thanks for the report and it's great that you are making progress!
> 
> I suggest you add a --param (edit params.def) so one can choose
> num_threads on the command-line instead of needing to recompile GCC.
> Just keep the default "safe" so that GCC build itself will still work.
> 
> For most of the allocators I think that in the end we want to
> keep most of them global but have either per-thread freelists
> or a freelist implementation that can work (allocate and free)
> without locking, employing some RCU scheme.  Not introducing
> per-thread state is probably leaner on the implementation.
> It would of course mean taking a lock when the freelist needs to
> be re-filled from the main pool but that's hopefully not common.
> I don't know a RCU allocator freelist implementation to copy/learn
> from, but experimenting with such before going the per thread freelist
> might be interesting.  Maybe not all allocators need to be treated
> equal either.
> 
> Your memory-block issue is likely that you added
> 
> {
>   if (!instance)
>     instance = XNEW (memory_block_pool);
> 
> but as misleading as it is, XNEW doesn't invoke C++ new but
> just malloc so the allocated structure isn't initialized
> since it's constructor isn't invoked.  Just use
> 
>     instance = new memory_block_pool;
> 
> with that I get helgrind to run (without complaining!) on your
> testcase.  I also get to compile gimple-match.c with two threads
> for more than one minute before crashing on some EVRP global
> state (somehow I knew the passes global state would be quite a
> distraction...).
> 
> I hope the project will be motivation to cleanup the way we
> handle pass-specific global state.

Btw, to get to "working" state quicker you might consider
concentrating on a pass subset for which you can conveniently
restrict optimization to just -Og, effectively parallelizing
pass_all_optimizations_g only, you then probably hit more
issues in infrastructure which is more interesting for the
project (we know there's a lot of pass-specific global state...).
Of course the time spent in pass_all_optimizations_g is minimal...

I then hit tree-ssa-live.c:usedvars quickly (slap __thread on it)
and after that the EVRP issue via the sprintf_length pass.

Richard.

Reply via email to