Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-16 Thread Jerry DeLisle
Committed after approval on bugzilla to eliminate warnings. 2016-11-16 Jerry DeLisle PR libgfortran/51119 * Makefile.am: Remove -fno-protect-parens -fstack-arrays. * Makefile.in: Regenerate. r242517 = 026291bdda18395d7c746856dd7e4ed384856a1b (refs/remotes/svn/trunk)

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Janne Blomqvist
On Tue, Nov 15, 2016 at 6:37 PM, Jerry DeLisle wrote: > All comments incorporated. Standing by for approval. Looks good, nice job! Ok for trunk. I was thinking that for strided arrays, it probably is faster to copy them to dense arrays before doing the matrix multiplication. That would also enab

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Jerry DeLisle
On 11/15/2016 07:59 AM, Jerry DeLisle wrote: On 11/14/2016 11:22 PM, Thomas Koenig wrote: Hi Jerry, With these changes, OK for trunk? Just going over this with a fine comb... One thing just struck me: The loop variables should be index_type, so const index_type m = xcount, n = ycou

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Jerry DeLisle
On 11/14/2016 11:22 PM, Thomas Koenig wrote: Hi Jerry, With these changes, OK for trunk? Just going over this with a fine comb... One thing just struck me: The loop variables should be index_type, so const index_type m = xcount, n = ycount, k = count; [...] index_type a_dim1, a

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-15 Thread Richard Biener
On Mon, Nov 14, 2016 at 11:13 PM, Jerry DeLisle wrote: > On 11/13/2016 11:03 PM, Thomas Koenig wrote: >> >> Hi Jerry, >> >> I think this >> >> + /* Parameter adjustments */ >> + c_dim1 = m; >> + c_offset = 1 + c_dim1; >> >> should be >> >> + /* Parameter adjustments */ >> +

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-14 Thread Thomas Koenig
Hi Jerry, With these changes, OK for trunk? Just going over this with a fine comb... One thing just struck me: The loop variables should be index_type, so const index_type m = xcount, n = ycount, k = count; [...] index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-14 Thread Jerry DeLisle
On 11/13/2016 11:03 PM, Thomas Koenig wrote: Hi Jerry, I think this + /* Parameter adjustments */ + c_dim1 = m; + c_offset = 1 + c_dim1; should be + /* Parameter adjustments */ + c_dim1 = rystride; + c_offset = 1 + c_dim1; Regarding options for matmul: It is po

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-13 Thread Thomas Koenig
Hi Jerry, I think this + /* Parameter adjustments */ + c_dim1 = m; + c_offset = 1 + c_dim1; should be + /* Parameter adjustments */ + c_dim1 = rystride; + c_offset = 1 + c_dim1; Regarding options for matmul: It is possible to add the options to the lines in Make

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-13 Thread Jerry DeLisle
On 11/13/2016 04:55 PM, Steve Kargl wrote: On Sun, Nov 13, 2016 at 04:08:50PM -0800, Jerry DeLisle wrote: Hi all, Attached patch implements a fast blocked matrix multiply. The basic algorithm is derived from netlib.org tuned blas dgemm. See matmul.m4 for reference. The matmul() function is com

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-13 Thread Steve Kargl
On Sun, Nov 13, 2016 at 04:08:50PM -0800, Jerry DeLisle wrote: > Hi all, > > Attached patch implements a fast blocked matrix multiply. The basic algorithm > is > derived from netlib.org tuned blas dgemm. See matmul.m4 for reference. > > The matmul() function is compiled with -Ofast -funroll-loo