https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
--- Comment #22 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Here are the details of how I tested this.
I generated the in_pack_r4.i and in_unpack_r4.i by adding -save-temps to the
Makefile options in ~/trunk-bin/powerpc64le-unknown-linux-gnu/libgfortran ,
then removing in_pack_r4.* and in_unpack_r4.* there and running make.
In the benchmark directory, I then used
bench.f90:
program main
real, dimension(:,:), allocatable :: a
allocate (a(50000,4))
call random_number (a)
do i=1,5000000
call foo(a(i::2,:))
call foo(a)
end do
end program main
foo.f90:
subroutine foo(a)
real, dimension(*) :: a
end subroutine foo
(constants can be adjusted). The first call to foo needs a repacking,
the second one is just to confuse the optimizer not to exit the loop.
With the command line
gfortran -g -fno-inline-arg-packing -O2 bench.f90 foo.f90 in_pack_r4.i
in_unpack_r4.i -static-libgfortran && time ./a.out
a test can be run. -fno-inline-arg-repacking is important because
otherwise the internal packing routines will not be called, and
putting in in_pack_r4.i and in_unpack_r4.i will use those instead
of the ones from the (static) library.
in_pack_r4.i and in_unpack_r4.i can then be adjusted, for
exmaple by adding a #pragma GCC unroll 1.