for ordered) performance

jakub at gcc dot gnu.org Tue, 27 Jun 2017 10:34:51 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81108


--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
With schedule(static) or schedule(dynamic) etc. I believe the compiler is not
allowed to do it, at least if it can't prove it won't be observable.
So, if you have
int cnt = 0;
#pragma omp parallel for schedule(static) ordered(2) collapse(2) \
firstprivate(cnt)
for (j = ...)
for (i = ...)
{
  #pragma omp ordered depend(sink:j,i-1) depend(sink:j-1,i)
depend(sink:j-1,i-1)
  arr[i][j] = omp_get_thread_num ();
  arr2[i][j] = cnt++;
  grid[...] = ...;
  #pragma omp ordered source
}
then you really can't schedule arbitrarily, the standard specifies the
requirements, and the above code observes both the thread num each logical
iteration is assigned to, and in what order.
The only way where you have complete freedom is no schedule clause (then it is
implementation defined what happens) or schedule(auto) (likewise).
But it still requires advanced loop optimizations to find optimal schedule for
the particular loop (and other loops would need different scheduling
decisions), and the schedule would also need to take into account some
constraints of the doacross post/wait operations (it has to ensure all threads
don't get stuck waiting for something that hasn't been scheduled).  Plus unless
we want complex scheduling functions in the library the compiler has to map the
logical iterations to the logical iterations of the parallelized loop and
similarly map the indices.  It is doable, but huge amount of work.

[Bug libgomp/81108] OpenMP doacross (omp do/for ordered) performance

Reply via email to