Den 13.06.2011 19:51, skrev srean: > If you are on an intel machine and you have MKL libraries around I > would strongly recommend that you use the matrix multiplication > routine if possible. MKL will do the parallelization for you. Well, > any good BLAS implementation would do the same, you dont really need > MKL. ATLAS and ACML would work too, just that MKL has been setup for > us and it works well.
Never mind ATLAS. Alternatives to MKL are GotoBLAS2, ACML and ACML-GPU. GotoBLAS2 is generally faster than MKL. The relative performance of ACML and MKL depends on the architecture, but both are now fast on either architecture. ACML-GPU will move matrix multiplication (*GEMM subroutines) to the (AMD/ATI) GPU if it can (and the problem is large enough). MKL used to run in tortoise mode on AMD chips, but not any longer due to intervention by the Federal Trade Commission. IMHO, trying to beat Intel or AMD performance library developers with Python, NumPy and multiprocessing is just silly. Nothing we do with array operator * and np.sum is ever going to compare with BLAS functions from these libraries. Sometimes we need a little bit more course-grained parallelism. Then it's time to think about Python threads and releasing the GIL or use OpenMP with C or Fortran. multiprocessing is the last tool to think about. It is mostly approproate for 'embarassingly parallel' paradigms, and certainly not the tool for parallel matrix multiplication. Sturla _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
