https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128

            Bug ID: 116128
           Summary: missed optimisation: fortran sum instrinsic performed
                    in order
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

gfortran-14 performs the Fortran sum intrinsic strictly in order, thus
preventing any vectorisation and imposing a data dependency between each scalar
add operation.

The Fortran standard does not seem to require this. F2023 16.9.201 "the result
of SUM (ARRAY) has a value equal to a processor-dependent approximation to the
sum of all the elements of ARRAY or has the value zero if ARRAY has size zero."
The lack of a specified ordering, and the use of the term "processor-dependent
approximation", makes me think that the optimisations of omp simd reduce(+)
would be permitted.

On a quick test case, gfortran-14 -O3 -mavx2 summed a double precision array at
1.33ns per element. Nvfortran, with the same options, managed 0.1ns per
element, using four independent ymm registers as accumulators, so having
sixteen scalar partial sums.

The same comment applies to dot_product, and probably the other intrinsic
reduction operations.

Reply via email to