Hello,
I am currently writing some numerical code in Fortran 2003 and I want
to use the spread intrinsic because having used NumPy heavily for the
past few years, it feels natural to use such an array primitive.
I naturally wondered what would be the effect on performance and found
this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751

TLDR: spread is as fast, if not faster than a do loop, when using
ifort. However, it is significantly slower (up to 100% in my
microbenchmarks) with gfortran 12.2.0.

Investigating the matter a bit more, I noticed that ifort recognize
the pattern and essentially produce the same code for both the do loop
and the spread call, while gfortran “naively” call spread, even with
-O3.

Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP

So, my question is: is this something that could be better optimized?
I wonder if simply allowing the compiler to inline spread wouldn't
already enable further optimizations that would lead to the same kind
of performance as found in ifort.
I also think other array intrinsic may benefit from this effort if
similar strategies can be applied.
While I have never been contributing to GCC, but I would be willing to
do this implementation if it is in the reach of my C++ skills, and if
someone can point me in the right direction.

Regards,
Théo

Reply via email to