https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957
--- Comment #24 from Anthony <prop_design at protonmail dot com> --- (In reply to rguent...@suse.de from comment #23) > On Sun, 28 Jun 2020, prop_design at protonmail dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53957 > > > > --- Comment #22 from Anthony <prop_design at protonmail dot com> --- > > (In reply to Thomas Koenig from comment #21) > > > Another question: Is there anything left to be done with the > > > vectorizer, or could we remove that dependency? > > > > thanks for looking into this again for me. i'm surprised it worked the same > > on > > Linux, but knowing that, at least helps debug this issue some more. I'm not > > sure about the vectorizer question, maybe that question was intended for > > someone else. the runtimes seem good as is though. i doubt the > > auto-parallelization will add much speed. but it's an interesting feature > > that > > i've always hoped would work. i've never got it to work though. the only > > code > > that did actually implement something was Intel Fortran. it implemented one > > trivial loop, but it slowed the code down instead of speeding it up. the > > output > > from gfortran shows more loops it wants to run in parallel. they aren't > > important ones. but something would be better than nothing. if it slowed the > > code down, i would just not use it. > > GCC adds runtime checks for a minimal number of iterations before > dispatching to the parallelized code - I guess we simply never hit > the threshold. This is configurable via --param parloops-min-per-thread, > the default is 100, the default number of threads is determined the same > as for OpenMP so you can probably tune that via OMP_NUM_THREADS. thanks for that tip. i tried changing the parloops parameters but no luck. the only difference was the max thread use went from 2 to 3. core use was the same. i added the following an some variations of these: --param parloops-min-per-thread=2 (the default was 100 like you said) --param parloops-chunk-size=1 (the default was zero so i removed this parameter later) --param parloops-schedule=auto (tried all options except guided, the default is static) i was able to check that they were set via: --help=param -Q some other things i tried was adding -mthreads and removing -static. but so far no luck. i also tried using -mthreads instead of -pthread. i should make clear i'm testing PROP_DESIGN_MAPS, not MP_PROP_DESIGN. MP_PROP_DESIGN is ancient and the added benchmarking loops were messing with the ability of the optimizer to auto-parallelize (in the past at least).