https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81108

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC implements what is required if there is schedule(static), which is the
implementation defined schedule right now, which gives the requirement how the
iterations are distributed to different threads and I don't see how could you
get good performance with that distribution (if you have ideas, feel free to
explain them here).  In order to perform well on this testcase (which doesn't
look very suitable for doacross because the computation is inexpensive and so
the needed synchronization dominates the execution time), we'd have to use a
different schedule, specific for this exact loop (proceed diagonally from 2, 2
to n, m or something like that).

Reply via email to