> Indeed, this is true for every system that is still in development. > But as I responded to Mark Hahn, there are still many linux > distributions deployed that have libc-2.3.3 or older. I guess your > products (I had a quick look but could not find the info directly) are > also still supporting linux distributions with libc-2.3.3 or older.
My memory is that older versions of x86_64 libc have a different set of affinity functions (different # of args). PathScale supported both. > >First off, I see people using *threaded* DGEMM, not OpenMP. > > I did not differentiate between these two in my previous mail because to > me it's an implementation issue. Both come down to using multiple threads. It's extremely inconvenient to express an efficient DGEMM in OpenMP, just like it's pretty inconvent to express an efficient serial DGEMM. So you won't find anyone using an OpenMP DGEMM. You can call everything in the universe an implementation issue if you like. > We have benchmarked our code with using multiple BLAS implementations > and so far GotoBLAS came out as a clear winner. Next we tested GotoBLAS > using 1,2 and 4 threads and depending on the linear solver (of which one > is http://graal.ens-lyon.fr/MUMPS/) we had a speedup of between 30% and > 70% when using 2 or 4 threads. Sorry, did you compare against a pure MPI implementation? For example the HPL code can run either way, so it's easy to compare. But if you're comparing a serial code to a threaded code, it's no surprise that the threaded code can be faster, especially solving a problem which is not memory intensive. In fact I'd expect an even bigger win than 1.7X, perhaps you aren't using Opterons ;-) -- greg _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf