Greg Lindahl wrote:
On Fri, Jun 15, 2007 at 01:49:49PM +0200, Toon Knapen wrote:
AFAICT this is not always the case. E.g. on systems with glibc, this
functionality (set_process_affinity and such) is only available starting
from libc-2.3.4.
Nearly every statement about Linux is untrue at some point in the
past.
Indeed, this is true for every system that is still in development.
But as I responded to Mark Hahn, there are still many linux
distributions deployed that have libc-2.3.3 or older. I guess your
products (I had a quick look but could not find the info directly) are
also still supporting linux distributions with libc-2.3.3 or older.
E.g. you can obtain a big boost when running an
MPI-code where each process performs local dgemm's for instance by using
an OpenMP'd dgemm implementation. This is an example where running
mixed-mode makes a lot of sense.
First off, I see people using *threaded* DGEMM, not OpenMP.
I did not differentiate between these two in my previous mail because to
me it's an implementation issue. Both come down to using multiple threads.
Second,
I've never seen anyone show an actual benefit -- can you name an
example? i.e. "for N=foo, I get a 13% speedup on..."
We have benchmarked our code with using multiple BLAS implementations
and so far GotoBLAS came out as a clear winner. Next we tested GotoBLAS
using 1,2 and 4 threads and depending on the linear solver (of which one
is http://graal.ens-lyon.fr/MUMPS/) we had a speedup of between 30% and
70% when using 2 or 4 threads.
The scalability of GotoBLAS in respect to the number of threads is
actually much better. But of course when integrated in a solver, the
speedup is strongly dependent on the size of the matrices being passed
to BLAS: the larger the better of course.
toon
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf