Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-11 Thread Thomas Koenig
Hi Mikael, >> Still to do: Bounds checking (a rather big one), > ... as you do a front-end to front-end transformation, you get bounds > checking for free, don't you? Only partially. What the patch does is integer i,j,k c = 0 do j=0, size(b,2)-1 do k=0, size(a, 2)-1

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-11 Thread Mikael Morin
Hello, I haven't looked at the patch in detail yet, but... Le 11/04/2015 14:24, Thomas Koenig a écrit : > Still to do: Bounds checking (a rather big one), ... as you do a front-end to front-end transformation, you get bounds checking for free, don't you? Mikael

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-11 Thread Thomas Koenig
OK, here is a new version. There is now an option for setting a maximum on the array size, which takes its default from the BLAS limit (if specified). Currently, only setting the maximum size to zero as a way of disabling the unrolling is supported. I have done this in a few test cases. The bug

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-07 Thread David Malcolm
On Sun, 2015-04-05 at 14:32 +0200, Thomas Koenig wrote: > Hello world, > > this is a first draft of a patch to inline matmul (PR 37171). This is (FWIW, the above PR# looks like it should be PR 37131)

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-06 Thread Dominique d'Humières
> Le 6 avr. 2015 à 01:15, Dominique d'Humières a écrit : > > The patch causes the following regressions: > > FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic > (internal compiler error) > … > FAIL: gfortran.dg/bound_8.f90 -g -flto (test for excess errors) > > I think t

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique d'Humières
The patch causes the following regressions: FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (internal compiler error) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (test for excess errors) FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lc

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Dominique, > which means that -fexternal-blas should disable the inlining. It is not surprising that a higly tuned BLAS library is better than a simple inlining for large matrices. I did some tests by adjusting n; it seems the inline version is faster for n<=22, which is not too bad. Regardi

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique d'Humières
I have done some timings (1) with the test given below, before the patch I get (last column in Gflops) [Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate [Book15] f90/bug% time a.out Time, MATMUL:373.708008 373.69497100014.2815668504139435 T

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Dominique, > IMO the inlining of MATMUL should be restricted to small matrices (less than > 4x4, 9x9 > or 16x16 depending of your field!-) The problem with the library function we have is that it is quite general; it can deal with all the complexity of assumed-shape array arguments. Inlining

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Dominique Dhumieres
> > So, what do you think about this? > > I am curious about what performance gain results from this? > I can see saving a library call to our runtime libraries. > Do you have some timing results? > > Jerry IMO the inlining of MATMUL should be restricted to small matrices (less than 4x4, 9x9 or 1

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hi Jerry, > I am curious about what performance gain results from this? I can see > saving a library call to our runtime libraries. Do you have some timing > results? The speedup can be quite drastic for small matrices which can be completely unrolled by -O3: b1.f90: program main use b2 im

Re: [patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Jerry DeLisle
On 04/05/2015 05:32 AM, Thomas Koenig wrote: --- snip --- So, what do you think about this? Thomas I am curious about what performance gain results from this? I can see saving a library call to our runtime libraries. Do you have some timing results? Jerry

[patch, fortran, RFC] First steps towards inlining matmul

2015-04-05 Thread Thomas Koenig
Hello world, this is a first draft of a patch to inline matmul (PR 37171). This is preliminary, but functional as far as it goes. Definitely for the next stage one :-) Basically, it takes c = matmul(a,b) and converts this into BLOCK integer i,j,k c = 0 do j=0, size(b,2)-1