https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Bug ID: 68600 Summary: Inlined MATMUL is too slow. Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: dominiq at lps dot ens.fr Target Milestone: --- Expected results: (1) to be at least as fast as the MATMUL from the library when compiled with the same options (-O2 -ftree-vectorize -funroll-loops); (2) to be at least as fast as dgemm from lapack when compiled with the same options. Options tested (a) -O2 -ftree-vectorize -funroll-loops -fno-frontend-optimize, i.e., MATMUL from the library; (b) -O2 -ftree-vectorize -funroll-loops, i.e., inlined MATMUL; (c) -Ofast -march=native -funroll-loops; (d) -O2 -ftree-vectorize. Timings in Gflops/s on a Corei7 2.8Ghz (turbo 3.8Ghz) show that neither (1) nor (2) are true (comparing columns 4 and 6 gives an idea of timings accuracy). (a) (b) Size Loops Matmul dgemm Matmul dgemm =================================================== 2 200000 0.360 0.218 0.723 0.221 4 200000 1.246 0.959 1.379 0.969 8 200000 2.098 2.396 2.186 2.385 16 200000 3.748 3.648 2.920 3.645 32 200000 5.386 5.406 3.096 5.418 64 30757 6.364 6.385 3.220 6.494 128 3829 6.362 6.760 3.256 6.702 256 477 6.515 6.527 3.164 6.444 512 59 6.313 6.634 3.189 6.675 1024 7 4.796 4.842 2.935 4.853 2048 1 4.026 4.032 2.824 3.996 4096 1 3.355 3.467 2.652 3.475 (c) (d) Size Loops Matmul dgemm Matmul dgemm ======================================================== 2 200000 0.403 0.172 0.919 0.204 4 200000 0.956 0.799 1.668 1.104 8 200000 1.796 2.089 2.060 2.310 16 200000 2.948 4.297 2.253 3.475 32 200000 4.119 6.219 2.049 4.229 64 30757 5.174 7.652 2.268 4.464 128 3829 5.042 6.985 2.371 4.353 256 477 5.052 6.492 2.423 4.696 512 59 5.136 6.738 2.421 4.704 1024 7 3.978 5.075 2.361 4.012 2048 1 3.476 4.304 2.372 3.543 4096 1 2.966 3.307 2.370 3.333