It seems that gfortran will inline MATMUL with optimization.
This produce very poor performance. In fact, gfortran will
inline MATMUL even if one specifies -fexternal-blas. This is
very bad.
% cat a.f90
program main
implicit none
integer, parameter :: imax = 20000, jmax = 10000
real, allocatable :: inVect(:), matrix(:,:), outVect(:)
real :: start, finish
allocate(invect(imax), matrix(imax,jmax), outvect(jmax))
call random_number(inVect)
call random_number(matrix)
call cpu_time(start)
outVect = matmul(inVect, matrix)
call cpu_time(finish)
print '("Time = ",f10.7," seconds. – First Value =
",f10.4)',finish-start,outVect(1)
end program main
% gfcx -o z -O0 a.f90 && ./z
Time = 0.2234111 seconds. – First Value = 4982.6362
% nm z | grep matmul
U _gfortran_matmul_r4@@GFORTRAN_8
% gfcx -o z -O1 a.f90 && ./z
Time = 0.3295890 seconds. – First Value = 4971.0962
% nm z | grep matmul
% gfcx -o z -O2 a.f90 && ./z
Time = 0.3299561 seconds. – First Value = 5025.4902
% nm z | grep matmul
% gfcx -o z -O2 -fexternal-blas a.f90 && ./z
Time = 0.3295580 seconds. – First Value = 5022.8291
This last one is definitely broken. I did not link with
an external BLAS library. Please fix before 11.1 is
released.
--
Steve