11 Regression] matmul on temporary array accesses invalid memory

Harald Anlauf via Gcc-patches Thu, 04 Mar 2021 12:23:12 -0800

Hi Jerry,

> Yes, OK, however, have you been able to test performance. I am only
> curious. There was a test program we used back when this code was first
> implemented in bugzilla. I do not remember the PR number off hand.


as you mentioned in a private mail, it was PR51119, and the timing program

  https://gcc.gnu.org/bugzilla/attachment.cgi?id=40039

I needed to fix the source code slightly to make it work with current gfortran,
by replacing the subroutine dummy with

subroutine dummy(a,b)
  integer, parameter :: wp = selected_real_kind(4), &
       dp = selected_real_kind(8)
  real(dp), intent(in),    dimension(1) :: a
  real(dp), intent(inout), dimension(1) :: b
end subroutine dummy

Testing it on my notebook with an Intel i5-8250U which has avx2, I found no
significant differences between the current master and the version with the
patch when compiling with

% gfc-11 -static -O2 -march=native -finline-matmul-limit=0 compare.f90

E.g. gcc-11 with patch to libfortran:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.025      0.139      0.025      0.026
    4  2000      0.191      0.799      0.743      0.741
    8  2000      3.272      2.437      3.280      3.311
   16  2000      7.615      2.768      8.405      7.572
   32  2000      8.492      3.063      9.733      9.521
   64  2000     14.137      3.299     14.118     14.295
  128  2000     18.838      3.128     19.149     18.893
  256   477     17.214      3.256     17.293     17.255
  512    59     17.940      3.316     17.986     17.985
 1024     7     17.672      2.665     17.691     17.698
 2048     1     17.571      2.595     17.559     17.170

With unmodified gcc-11:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.024      0.194      0.025      0.025
    4  2000      0.231      1.641      0.718      0.716
    8  2000      3.424      2.445      3.198      3.435
   16  2000      7.715      2.718      7.615      7.845
   32  2000      8.696      3.088      9.728      9.772
   64  2000     14.171      3.275     13.995     14.447
  128  2000     18.931      3.127     18.942     19.019
  256   477     17.239      3.232     17.267     17.291
  512    59     17.938      3.315     17.967     17.996
 1024     7     17.674      2.632     17.673     17.711
 2048     1     17.579      2.581     17.552     17.587

give or take.  (For those too lazy to check: refMatmul is just
the naive explicit matmul).

However, when comparing with older gccs I got better numbers!  E.g. gcc-7:

 =========================================================
 ================            MEASURED GIGAFLOPS          =
 =========================================================
                 Matmul                           Matmul
                 fixed                 Matmul     variable
 Size  Loops     explicit   refMatmul  assumed    explicit
 =========================================================
    2  2000      0.113      0.199      0.126      0.150
    4  2000      0.866      0.865      0.766      0.881
    8  2000      3.551      2.750      3.371      3.852
   16  2000      7.826      3.517      7.489      7.464
   32  2000      9.989      3.859     11.811     11.903
   64  2000     16.218      4.213     16.501     16.687
  128  2000     19.971      4.006     20.070     20.049
  256   477     22.804      4.139     22.949     22.894
  512    59     23.637      4.047     23.800     23.765
 1024     7     23.051      3.065     23.177     23.152
 2048     1     22.953      2.784     22.946     22.960

So if I were worried that there is a performance penalty by my patch,
I'd look for other places, too.

Cheers,
Harald

Aw: Re: [PATCH] PR libfortran/99218 - [8/9/10/11 Regression] matmul on temporary array accesses invalid memory

Reply via email to