Hi Steve,

On my old core2 cpu, a quick test with N=1000 and NxN matrix
suggest a cross over near N=1000 for REAL(4).  This cpu doesn't
have any AVX* instruction, so YMMV.  Program follows .sig

Looking at your data with AVX (which I think we can mostly count
on now),

- The library is always faster for matmul(vector,matrix) for any n >=100
- For matmul(matrix,vector) there is no appreciable difference

So, putting in the same inline limits for matmul(vector,matrix)
that we have for matmul(matrix,matrix), and leaving
mamul(matrix,vector) alone, seems like a reasonable thing to do.

I'll work on a patch.

Regards

        Thomas

Reply via email to