Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

Jerry DeLisle Tue, 15 Nov 2016 08:00:24 -0800

On 11/14/2016 11:22 PM, Thomas Koenig wrote:

Hi Jerry,

With these changes, OK for trunk?


Just going over this with a fine comb...

One thing just struck me:   The loop variables should be index_type, so

      const index_type m = xcount, n = ycount, k = count;

[...]

   index_type a_dim1, a_offset, b_dim1, b_offset, c_dim1, c_offset, i1, i2,
          i3, i4, i5, i6;

      /* Local variables */
      GFC_REAL_4 t1[65536], /* was [256][256] */
         f11, f12, f21, f22, f31, f32, f41, f42,
         f13, f14, f23, f24, f33, f34, f43, f44;
      index_type i, j, l, ii, jj, ll;
      index_type isec, jsec, lsec, uisec, ujsec, ulsec;

I agree that we should do the tuning of the inline limit
separately.

Several of my iterations used index_type. I found using integer gives betterperformance. The reason is that they are of type ptr_diff_t which is a 64 bitinteger. I suspect we eliminate one memory fetch for each of these and reducethe register loading by reducing the number of registers needed, two for onesituation. I will change back and retest.


and Paul commeneted "-ftree-vectorize turns on -ftree-loop-vectorize and
-ftree-slp-vectorize already."

I will remove those to options and keep -ftree-vectorize

I will report back my findings.

Thanks, and a fine tooth comb is a very good thing.

Jerry

Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

Reply via email to