Greg Lindahl wrote:
On Fri, Jun 15, 2007 at 01:49:49PM +0200, Toon Knapen wrote:

AFAICT this is not always the case. E.g. on systems with glibc, this functionality (set_process_affinity and such) is only available starting from libc-2.3.4.

Nearly every statement about Linux is untrue at some point in the
past.


Indeed, this is true for every system that is still in development.
But as I responded to Mark Hahn, there are still many linux distributions deployed that have libc-2.3.3 or older. I guess your products (I had a quick look but could not find the info directly) are also still supporting linux distributions with libc-2.3.3 or older.



E.g. you can obtain a big boost when running an MPI-code where each process performs local dgemm's for instance by using an OpenMP'd dgemm implementation. This is an example where running mixed-mode makes a lot of sense.

First off, I see people using *threaded* DGEMM, not OpenMP.

I did not differentiate between these two in my previous mail because to me it's an implementation issue. Both come down to using multiple threads.


Second,
I've never seen anyone show an actual benefit -- can you name an
example? i.e. "for N=foo, I get a 13% speedup on..."


We have benchmarked our code with using multiple BLAS implementations and so far GotoBLAS came out as a clear winner. Next we tested GotoBLAS using 1,2 and 4 threads and depending on the linear solver (of which one is http://graal.ens-lyon.fr/MUMPS/) we had a speedup of between 30% and 70% when using 2 or 4 threads. The scalability of GotoBLAS in respect to the number of threads is actually much better. But of course when integrated in a solver, the speedup is strongly dependent on the size of the matrices being passed to BLAS: the larger the better of course.

toon

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to