https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68600
Jerry DeLisle <jvdelisle at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jvdelisle at gcc dot gnu.org --- Comment #4 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> --- Created attachment 36869 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36869&action=edit Thomas program with a modified dgemm. The dgemm in this example is a stripped out version of an "optimized for cache" version from netlib.org. I stripped out a lot of the unused code. Results show better performance for larger arrays. Maybe we could model the library routines after this and invoke for larger arrays. Size Loops Matmul dgemm Matmul Matmul fixed explicit assumed variable explicit ============================================================================== 2 200000 1.752 0.042 0.124 0.295 4 200000 2.172 0.314 0.434 0.704 8 200000 2.293 1.071 0.721 1.127 16 200000 2.826 1.533 0.972 1.468 32 200000 2.707 1.666 1.184 2.154 64 30757 2.726 1.853 1.192 2.299 128 3829 2.641 1.965 1.379 2.542 256 477 2.661 2.001 1.384 2.594 512 59 1.740 2.011 1.147 1.746 1024 7 1.344 2.024 1.070 1.355 2048 1 1.305 2.026 1.088 1.312