For calculation involving multidimensional array multiplication followed by a sum along first dimension, GCC performs the steps separately - the element-by-element array multiplication is completed first. Function gfortran_sum_r8 is called next to calculate the sum. A better process would be to keep an accumulator updated as the element-by-element array multiplication is carried out. This has following benefits: i. gfortran_sum_r8 call is eliminated. ii. there is no longer a need for temporary array to hold array multiplication result.
subroutine sum_test(Rx,Ry,Rz,nx,ny) implicit none integer(kind=kind(1)), intent(in) :: nx,ny real(kind=kind(1.0d0)), dimension(nx,ny), intent(in) :: Rx,Ry real(kind=kind(1.0d0)), dimension(ny), intent(out) :: Rz Rz = sum(Rx * Ry, 1) end subroutine sum_test Other relevant information: 1. Compile flags: -O3 -ffast-math -m64 -march=amdfam10 2. gfortran version: gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/src/gcc-4.3.0/configure --prefix=/opt/amd/gcc-4.3.0 --enable-languages=c,c++,fortran --enable-stage1-checking --with-as=/opt/amd/gcc-4.3.0/bin/as --with-ld=/opt/amd/gcc-4.3.0/bin/ld --with-mpfr=/tmp/install/mpfr-2.3.0 --with-gmp=/tmp/install/gmp-4.2.2 Thread model: posix gcc version 4.3.1 20080312 (prerelease) (GCC) 3. model name: AMD Phenom(tm) 8650 Triple-Core Processor 4. flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw -- Summary: Eliminate gfortran_sum_r8 call for calculation involving multidimensional array multiplication followed by a sum along first dimension Product: gcc Version: 4.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rajiv dot adhikary at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36841