https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957

--- Comment #24 from Anthony <prop_design at protonmail dot com> ---
(In reply to rguent...@suse.de from comment #23)
> On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957
> > 
> > --- Comment #22 from Anthony <prop_design at protonmail dot com> ---
> > (In reply to Thomas Koenig from comment #21)
> > > Another question: Is there anything left to be done with the
> > > vectorizer, or could we remove that dependency?
> > 
> > thanks for looking into this again for me. i'm surprised it worked the same 
> > on
> > Linux, but knowing that, at least helps debug this issue some more. I'm not
> > sure about the vectorizer question, maybe that question was intended for
> > someone else. the runtimes seem good as is though. i doubt the
> > auto-parallelization will add much speed. but it's an interesting feature 
> > that
> > i've always hoped would work. i've never got it to work though. the only 
> > code
> > that did actually implement something was Intel Fortran. it implemented one
> > trivial loop, but it slowed the code down instead of speeding it up. the 
> > output
> > from gfortran shows more loops it wants to run in parallel. they aren't
> > important ones. but something would be better than nothing. if it slowed the
> > code down, i would just not use it.
> 
> GCC adds runtime checks for a minimal number of iterations before
> dispatching to the parallelized code - I guess we simply never hit
> the threshold.  This is configurable via --param parloops-min-per-thread,
> the default is 100, the default number of threads is determined the same
> as for OpenMP so you can probably tune that via OMP_NUM_THREADS.

thanks for that tip. i tried changing the parloops parameters but no luck. the
only difference was the max thread use went from 2 to 3. core use was the same.

i added the following an some variations of these:

--param parloops-min-per-thread=2 (the default was 100 like you said) --param
parloops-chunk-size=1 (the default was zero so i removed this parameter later)
--param parloops-schedule=auto (tried all options except guided, the default is
static)

i was able to check that they were set via:

--help=param -Q

some other things i tried was adding -mthreads and removing -static. but so far
no luck. i also tried using -mthreads instead of -pthread.

i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN.
MP_PROP_DESIGN is ancient and the added benchmarking loops were messing with
the ability of the optimizer to auto-parallelize (in the past at least).

Reply via email to