https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
Bug ID: 116128 Summary: missed optimisation: fortran sum instrinsic performed in order Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: mjr19 at cam dot ac.uk Target Milestone: --- gfortran-14 performs the Fortran sum intrinsic strictly in order, thus preventing any vectorisation and imposing a data dependency between each scalar add operation. The Fortran standard does not seem to require this. F2023 16.9.201 "the result of SUM (ARRAY) has a value equal to a processor-dependent approximation to the sum of all the elements of ARRAY or has the value zero if ARRAY has size zero." The lack of a specified ordering, and the use of the term "processor-dependent approximation", makes me think that the optimisations of omp simd reduce(+) would be permitted. On a quick test case, gfortran-14 -O3 -mavx2 summed a double precision array at 1.33ns per element. Nvfortran, with the same options, managed 0.1ns per element, using four independent ymm registers as accumulators, so having sixteen scalar partial sums. The same comment applies to dot_product, and probably the other intrinsic reduction operations.