Hello,

welcome, and thanks for your interest.

Le 03/11/2022 à 11:48, Théo Cavignac via Fortran a écrit :
Hello,
I am currently writing some numerical code in Fortran 2003 and I want
to use the spread intrinsic because having used NumPy heavily for the
past few years, it feels natural to use such an array primitive.
I naturally wondered what would be the effect on performance and found
this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751

TLDR: spread is as fast, if not faster than a do loop, when using
ifort. However, it is significantly slower (up to 100% in my
microbenchmarks) with gfortran 12.2.0.

Investigating the matter a bit more, I noticed that ifort recognize
the pattern and essentially produce the same code for both the do loop
and the spread call, while gfortran “naively” call spread, even with
-O3.

Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP

So, my question is: is this something that could be better optimized?
I wonder if simply allowing the compiler to inline spread wouldn't
already enable further optimizations that would lead to the same kind
of performance as found in ifort.
Well, obviously you can get the same performance gfortran gets with do loops if you make gfortran generate do loops in place for spread.

I also think other array intrinsic may benefit from this effort if
similar strategies can be applied.
While I have never been contributing to GCC, but I would be willing to
do this implementation if it is in the reach of my C++ skills, and if
someone can point me in the right direction.

The first step to do is get a work environment and build the latest gcc git master from source. The source is actually more C than C++ (the fortran front-end at least). It requires little C++ skills, but time and willingness to decipher its complexity.

There are two places where inlining can be done:
* In front-end passes where the parsed fortran code is rewritten before generating the intermediary code for the optimizers. Thomas König can help you there. * Directly in the code generation for the optimizers. It is (much) more complex but can avoid the need for temporaries. I can help you there.

Some links about our development process and conventions:
https://gcc.gnu.org/contribute.html
https://gcc.gnu.org/git.html

How to build GCC:
https://gcc.gnu.org/wiki/InstallingGCC


Mikael


Reply via email to