On Mon, Nov 14, 2016 at 11:13 PM, Jerry DeLisle <jvdeli...@charter.net> wrote: > On 11/13/2016 11:03 PM, Thomas Koenig wrote: >> >> Hi Jerry, >> >> I think this >> >> + /* Parameter adjustments */ >> + c_dim1 = m; >> + c_offset = 1 + c_dim1; >> >> should be >> >> + /* Parameter adjustments */ >> + c_dim1 = rystride; >> + c_offset = 1 + c_dim1; >> >> Regarding options for matmul: It is possible to add the >> options to the lines in Makefile.in >> >> # Turn on vectorization and loop unrolling for matmul. >> $(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += >> -ftree-vectorize >> -funroll-loops >> >> This is a great step forward. I think we can close most matmul-related >> PRs once this patch has been applied. >> >> Regards >> >> Thomas >> > > With Thomas suggestion, I can remove the #pragma optimize from the source > code. Doing this: (long lines wrapped as shown) > > diff --git a/libgfortran/Makefile.am b/libgfortran/Makefile.am > index 39d3e11..9ee17f9 100644 > --- a/libgfortran/Makefile.am > +++ b/libgfortran/Makefile.am > @@ -850,7 +850,7 @@ intrinsics/dprod_r8.f90 \ > intrinsics/f2c_specifics.F90 > > # Turn on vectorization and loop unrolling for matmul. > -$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize > -funroll-loops > +$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ffast-math > -fno-protect-parens -fstack-arrays -ftree-vectorize -funroll-loops --param > max-unroll-times=4 -ftree-loop-vectorize
-ftree-vectorize turns on -ftree-loop-vectorize and -ftree-slp-vectorize already. > # Logical matmul doesn't vectorize. > $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops > > > Comparing gfortran 6 vs 7: (test program posted in PR51119) > > $ gfc6 -static -Ofast -finline-matmul-limit=32 -funroll-loops --param > max-unroll-times=4 compare.f90 > $ ./a.out > ========================================================= > ================ MEASURED GIGAFLOPS = > ========================================================= > Matmul Matmul > fixed Matmul variable > Size Loops explicit refMatmul assumed explicit > ========================================================= > 2 2000 11.928 0.047 0.082 0.138 > 4 2000 1.455 0.220 0.371 0.316 > 8 2000 1.476 0.737 0.704 1.574 > 16 2000 4.536 3.755 2.825 3.820 > 32 2000 6.070 5.443 3.124 5.158 > 64 2000 5.423 5.355 5.405 5.413 > 128 2000 5.913 5.841 5.917 5.917 > 256 477 5.865 5.252 5.863 5.862 > 512 59 2.794 2.841 2.794 2.791 > 1024 7 1.662 1.356 1.662 1.661 > 2048 1 1.753 1.724 1.753 1.754 > > $ gfc -static -Ofast -finline-matmul-limit=32 -funroll-loops --param > max-unroll-times=4 compare.f90 > $ ./a.out > ========================================================= > ================ MEASURED GIGAFLOPS = > ========================================================= > Matmul Matmul > fixed Matmul variable > Size Loops explicit refMatmul assumed explicit > ========================================================= > 2 2000 12.146 0.042 0.090 0.146 > 4 2000 1.496 0.232 0.384 0.325 > 8 2000 2.330 0.765 0.763 0.965 > 16 2000 4.611 4.120 2.792 3.830 > 32 2000 6.068 5.265 3.102 4.859 > 64 2000 6.527 5.329 6.425 6.495 > 128 2000 8.207 5.643 8.336 8.441 > 256 477 9.210 4.967 9.367 9.299 > 512 59 8.330 2.772 8.422 8.342 > 1024 7 8.430 1.378 8.511 8.424 > 2048 1 8.339 1.718 8.425 8.322 > > I do think we need to adjust the default inline limit and should do this > separately from this patch. > > With these changes, OK for trunk? > > Regards, > > Jerry >