Hi Mikael,
>> Still to do: Bounds checking (a rather big one),
> ... as you do a front-end to front-end transformation, you get bounds
> checking for free, don't you?
Only partially.
What the patch does is
integer i,j,k
c = 0
do j=0, size(b,2)-1
do k=0, size(a, 2)-1
Hello, I haven't looked at the patch in detail yet, but...
Le 11/04/2015 14:24, Thomas Koenig a écrit :
> Still to do: Bounds checking (a rather big one),
... as you do a front-end to front-end transformation, you get bounds
checking for free, don't you?
Mikael
OK, here is a new version.
There is now an option for setting a maximum on the array size,
which takes its default from the BLAS limit (if specified).
Currently, only setting the maximum size to zero as a way of
disabling the unrolling is supported. I have done this in a
few test cases.
The bug
On Sun, 2015-04-05 at 14:32 +0200, Thomas Koenig wrote:
> Hello world,
>
> this is a first draft of a patch to inline matmul (PR 37171). This is
(FWIW, the above PR# looks like it should be PR 37131)
> Le 6 avr. 2015 à 01:15, Dominique d'Humières a écrit :
>
> The patch causes the following regressions:
>
> FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic
> (internal compiler error)
> …
> FAIL: gfortran.dg/bound_8.f90 -g -flto (test for excess errors)
>
> I think t
The patch causes the following regressions:
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (internal
compiler error)
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=single -O2 -latomic (test for
excess errors)
FAIL: gfortran.dg/coarray/dummy_1.f90 -fcoarray=lib -O2 -lc
Hi Dominique,
> which means that -fexternal-blas should disable the inlining.
It is not surprising that a higly tuned BLAS library is better than
a simple inlining for large matrices.
I did some tests by adjusting n; it seems the inline version is
faster for n<=22, which is not too bad.
Regardi
I have done some timings
(1) with the test given below, before the patch I get (last column in Gflops)
[Book15] f90/bug% gfc -Ofast timing/matmul_tst_sys.f90 -framework Accelerate
[Book15] f90/bug% time a.out
Time, MATMUL:373.708008 373.69497100014.2815668504139435
T
Hi Dominique,
> IMO the inlining of MATMUL should be restricted to small matrices (less than
> 4x4, 9x9
> or 16x16 depending of your field!-)
The problem with the library function we have is that it is quite
general; it can deal with all the complexity of assumed-shape array
arguments. Inlining
> > So, what do you think about this?
>
> I am curious about what performance gain results from this?
> I can see saving a library call to our runtime libraries.
> Do you have some timing results?
>
> Jerry
IMO the inlining of MATMUL should be restricted to small matrices (less than
4x4, 9x9
or 1
Hi Jerry,
> I am curious about what performance gain results from this? I can see
> saving a library call to our runtime libraries. Do you have some timing
> results?
The speedup can be quite drastic for small matrices which can be
completely unrolled by -O3:
b1.f90:
program main
use b2
im
On 04/05/2015 05:32 AM, Thomas Koenig wrote:
--- snip ---
So, what do you think about this?
Thomas
I am curious about what performance gain results from this? I can see saving a
library call to our runtime libraries. Do you have some timing results?
Jerry
Hello world,
this is a first draft of a patch to inline matmul (PR 37171). This is
preliminary, but functional as far as it goes. Definitely for the
next stage one :-)
Basically, it takes
c = matmul(a,b)
and converts this into
BLOCK
integer i,j,k
c = 0
do j=0, size(b,2)-1
13 matches
Mail list logo