https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92698
Thomas Koenig <tkoenig at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tkoenig at gcc dot gnu.org --- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- (In reply to mjr19 from comment #0) > subroutine cpy(a,src,dest,len) > integer, intent(in) :: src,dest,len > real(kind(1d0)), intent(inout) :: a(:) > > a(dest:dest+len-1)=a(src:src+len-1) > > end subroutine cpy > > > seems to compile to malloc tmp array, inline copy to tmp, inline copy from > tmp, free tmp in gfortran 7.4 and 8.3. Gfortran 9.2 modifies this by > replacing the inline copies with memcpy at -O3. > > Fortran permits the source and destination to overlap, so a single call to > memcpy would be wrong. It would also be wrong for another reason: a is not known to be contiguous at compile-time. The subroutine has to account for the fact that the caller could pass a non-contiguous array slice, for example via call cpy (a(1:10:2),1,2,2) If the test case said subroutine cpy(a,src,dest,len) integer, intent(in) :: src,dest,len real(kind(1d0)), intent(inout), contiguous :: a(:) or subroutine cpy(a,src,dest,len,n) integer, intent(in) :: src,dest,len,n real(kind(1d0)), intent(inout), contiguous :: a(n) then putting in a memmove could indeed help, and the caller has to repack the array on call, and unpack on return (which would defeat the purpose of the optimization). However, I am not convinced that this is something worth pursuing. Instead of calling a subroutine, the user might as well write an assignment statement directly into the rogram. This also has the advantage that, if the relationship between src and dest is known, for example via a(n:n+len-1) = a(n+1:n+len) the compiler will actually optimize this into a memmove (provided it knows the array is contiguous).