Re: [Beowulf] Large amounts of data to store and process

Lawrence Stewart Fri, 15 Mar 2019 18:20:36 -0700

> On 2019, Mar 15, at 4:22 PM, Douglas Eadline <deadl...@eadline.org> wrote:
> 
> 
>> We are definitely going that way, but for every day desktops, MPI is not
>> the way to go. Since most desktops are stand-alone islands,
>> multi-threading makes more sense, since it has less overhead compared to
>> MPI, and most desktop apps don't need the inter-node communications
>> provided by MPI.
>


I spent quite a lot of time in recent months looking at communications in 
single note parallel applications.  These are often available in both OpenMP 
and MPI versions.

I cannot recall a single example of an OpenMP version that was actually faster 
than the MPI version.

This is the sort of thing that is bound to cause arguments!  Of course there is 
plenty to argue about.

* OpenMP versions have a tendency to use less RAM because it is all shared.  
MPI versions have to duplicate read-only datasets and there are two or three 
copies of messages floating around depending on the MPI runtime.

* OpenMP versions are probably an easier way to get started with parallelism.  
Just slap on some #pragmas and you get some benefit

* MPI versions can run multinode or same node with equal ease.

* Some punters argue that MPI memory use scales badly with huge numbers of 
ranks, so a hybrid approach is best, with OpenMP on node and MPI between nodes. 
 I am not convinced. You get the complexities of both.

* The performance differences are not huge, in most cases.

* The OpenMP runtime is hardly free.  There is a <lot> of locking and copying 
and broadcasting and reducing and notifying going on down there.

* There is no particular reason to think that MPI copies are slower than 
coherent shared memory.  Fetching a value from another core is typically slower 
than L3 and only slightly faster than DRAM.  Even when MPI is two-copies via a 
shared memory page, one of them is likely to be local-cache and really not cost 
that much.

In the above I am comparing OpenMP and MPI as routes to parallelism.  It is 
easy to say “multithreading” but for most programmers it is a real tarpit.  You 
get a lot of programs that run on on the development machine and lock up 
elsewhere.  There are a few smart people who understand this stuff, but it is 
hard.  At least MPI and OpenMP and things like Thread Building Blocks and CILK 
can abstract the situation to a degree.  I wouldn’t advise anyone to roll their 
own multithreaded world. And use MCS locks for goodness sake!

I was left after all this thinking that OpenMP was a fine thing for folks 
getting started with parallelism, but MPI was probably a better bet.  There is 
also now 25 years of experience suggesting that you don’t have to be a wizard 
to get an MPi code to work.

-Larry (dons nomex suit)

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

Re: [Beowulf] Large amounts of data to store and process

Reply via email to