https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494

            Bug ID: 108494
           Summary: Slow thread creation with nested loops in GFortran
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dewhu...@mpi-halle.mpg.de
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

This is an issue with very slow thread creation for nested loops in code
compiled with GFortran, however I suspect it may be due to the libgomp library.

Here is a simple example the problem:

program test
implicit none
integer l
!$OMP PARALLEL DO &
!$OMP NUM_THREADS(1)
do l=1,1000
  call foo
end do
!$OMP END PARALLEL DO
end program

subroutine foo
implicit none
integer, parameter :: l=200,m=100,n=10
! number of threads
integer, parameter :: nthd=10
integer i,j
! automatic arrays
real(8) a(n,l),b(n,m),x(m)
a(:,:)=2.d0
b(:,:)=3.d0
do i=1,l
!$OMP PARALLEL DO DEFAULT(SHARED) &
!$OMP NUM_THREADS(nthd)
  do j=1,m
    x(j)=dot_product(a(:,i),b(:,j))
  end do
!$OMP END PARALLEL DO
end do
end subroutine

The wall-clock time is about 0.5 seconds when compiled with Intel or PGI
Fortran. However, for GFortran compiled with

gfortran -O3 -fopenmp test.f90

and OMP_NESTED set to true, the wall-clock time is about 70 seconds, or about
140 times slower. (The ‘dot_product’ can be removed from the loop – all the
time is taken with thread creation).

This only affects nested loops; if the OMP directives are removed from the loop
in the program part in the code above then GFortran is as fast as the other
compilers. I’ve tried several different versions of GFortran (from 7.5.0 to
12.1.0) on different Linux machines and it’s slow on all of them.

It may problem with libgomp. If I substitute the libgomp library for that
provided with the NVIDIA compiler (on our machine this is in the directory
nvhpcsdk/22.11/Linux_x86_64/22.11/compilers/lib/libgomp.so.1) then it’s as fast
as the others.

This has been reproduced by others and also in Windows, see here:
https://fortran-lang.discourse.group/t/slow-thread-creation-with-nested-loops-in-gfortran/5062

Reply via email to