https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Dominique d'Humieres <dominiq at lps dot ens.fr> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2015-11-30 Ever confirmed|0 |1 --- Comment #6 from Dominique d'Humieres <dominiq at lps dot ens.fr> --- > I think you are seeing the effects of inefficiencies of assumed-shape arrays. > > If you want to use matmul on very small matrix sizes, it is best to > use fixed-size explicit arrays. Well, the problem is that MATMUL inlining is the default. IMO it should be restricted to fixed-size explicit arrays (and small matrices?), at least for the 6.1 version. > Created attachment 36869 [details] > Thomas program with a modified dgemm. > > The dgemm in this example is a stripped out version of an "optimized for > cache" > version from netlib.org. I stripped out a lot of the unused code. It is probably too late for 6.1, but the results are quite impressive (~30Gflops/s peak): [Book15] f90/bug% gfc -Ofast timing/matmul_sys_8jd.f90 [Book15] f90/bug% a.out Size Loops Matmul dgemm Matmul Matmul fixed explicit assumed variable explicit ===================================================================================== 2 200000 0.969 0.104 0.360 0.368 4 200000 5.821 0.774 1.381 1.049 8 200000 5.415 2.970 2.316 2.342 16 200000 6.455 4.917 2.738 3.225 32 200000 7.332 5.964 2.893 4.117 64 30757 5.565 7.277 2.785 3.830 128 3829 4.790 7.982 2.981 4.384 256 477 4.674 8.375 3.077 4.675 512 59 4.797 8.200 3.156 4.786 1024 7 3.967 8.370 2.896 4.050 2048 1 3.693 8.414 2.804 3.650 [Book15] f90/bug% gfc -Ofast -mavx timing/matmul_sys_8jd.f90 [Book15] f90/bug% a.out Size Loops Matmul dgemm Matmul Matmul fixed explicit assumed variable explicit ===================================================================================== 2 200000 0.956 0.106 0.372 0.469 4 200000 7.805 0.715 1.334 1.462 8 200000 7.520 3.222 2.292 3.482 16 200000 3.001 6.406 2.671 4.917 32 200000 8.886 8.530 2.900 6.136 64 30757 10.203 10.998 2.677 6.770 128 3829 6.742 13.367 2.831 6.774 256 477 6.435 13.979 2.906 6.049 512 59 6.592 15.041 2.991 6.273 1024 7 5.247 14.639 2.775 4.922 2048 1 4.309 13.976 2.739 4.176 Note a problem when 16x16 matrices are inlined with -mavx (I'll investigate and file a PR for it).