Hello, I am currently writing some numerical code in Fortran 2003 and I want to use the spread intrinsic because having used NumPy heavily for the past few years, it feels natural to use such an array primitive. I naturally wondered what would be the effect on performance and found this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751
TLDR: spread is as fast, if not faster than a do loop, when using ifort. However, it is significantly slower (up to 100% in my microbenchmarks) with gfortran 12.2.0. Investigating the matter a bit more, I noticed that ifort recognize the pattern and essentially produce the same code for both the do loop and the spread call, while gfortran “naively” call spread, even with -O3. Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP So, my question is: is this something that could be better optimized? I wonder if simply allowing the compiler to inline spread wouldn't already enable further optimizations that would lead to the same kind of performance as found in ifort. I also think other array intrinsic may benefit from this effort if similar strategies can be applied. While I have never been contributing to GCC, but I would be willing to do this implementation if it is in the reach of my C++ skills, and if someone can point me in the right direction. Regards, Théo