Try disabling shared memory only.
Open MPI shared memory buffer is limited and it enters deadlock if you
overflow it.
As Open MPI uses busy wait, it appears as a livelock.


2008/7/9 Ashley Pittman <[EMAIL PROTECTED]>:

> On Tue, 2008-07-08 at 22:01 -0400, Joe Landman wrote:
> >    Short version:  The code starts and runs.  Reads in its data.  Starts
> > its iterations.  And then somewhere after this, it hangs.  But not
> > always at the same place.  It doesn't write state data back out to the
> > disk, just logs.  Rerunning it gets it to a different point, sometimes
> > hanging sooner, sometimes later.  Seems to be the case on multiple
> > different machines, with different OSes.  Working on comparing MPI
> > distributions, and it hangs with IB as well as with shared memory and
> > tcp sockets.
>
> Sounds like you've found a bug, doesn't sound too difficult to find,
> comments in-line.
>
> >    Right now we are using OpenMPI 1.2.6, and this code does use
> > allreduce.  When it hangs, an strace of the master process shows lots of
> > polling:
>
> Why do you mention allreduce, does it tend to be in allreduce when it
> hangs?  Is it happening at the same place but on a different iteration
> every time perhaps?  This is quite important, you could either have a
> "random" memory corruption which can cause the program to stop anywhere
> and are often hard to find or a race condition which is easier to deal
> with, if there are any similarities in the stack then it tends to point
> to the latter.
>
> allreduce is one of the collective functions with an implicit barrier
> which means that *no* process can return from it until *all* processes
> have called it, if you program uses allreduce extensively it's entirely
> possible that one process has stopped for whatever reason and have the
> rest continued as far as they can until they too deadlock.  Collectives
> often get accused of causing programs to hang when in reality N-1
> processes are in the collective call and 1 is off somewhere else.
>
> > c1-1:~ # strace -p 8548
>
> > [spin forever]
>
> Any chance of a stack trace, preferably a parallel one?  I assume *all*
> processes in the job are in the R state?  Do you have a mechanism
> available to allow you to see the message queues?
>
> > So it looks like the process is waiting for the appropriate posting on
> > the internal scoreboard, and just hanging in a tight loop until this
> > actually happens.
> >
> > But these hangs usually happen at the same place each time for a logic
> > error.
>
> Like in allreduce you mean?
>
> > But the odd thing about this code is that it worked fine 12 - 18 months
> > ago, and we haven't touched it since (nor has it changed).  What has
> > changed is that we are now using OpenMPI 1.2.6.
>
> The other important thing to know here is what you have changed *from*.
>
> > So the code hasn't changed, and the OS on which it runs hasn't changed,
> > but the MPI stack has.  Yeah, thats a clue.
>
> > Turning off openib and tcp doesn't make a great deal of impact.  This is
> > also a clue.
>
> So it's likely algorithmic?  You could turn off shared memory as well
> but it won't make a great deal of impact so there isn't any point.
>
> > I am looking now to trying mvapich2 and seeing how that goes.  Using
> > Intel and gfortran compilers (Fortran/C mixed code).
> >
> > Anyone see strange things like this with their MPI stacks?
>
> All the time, it's not really strange, just what happens on large
> systems, expecially when developing MPI or applications.
>
> > I'll try all the usual things (reduce the optimization level, etc).
> > Sage words of advice (and clue sticks) welcome.
>
> Is it the application which hangs or a combination of the application
> and the dataset you give it?  What's the smallest process count and
> timescale you can reproduce this on?
>
> You could try valgrind which works well with openmpi, it will help you
> with memory corruption but won't help be of much help if you have a race
> condition.  Going by reputation Marmot might be of some use, it'll point
> out if you are doing anything silly with MPI calls, there is enough
> flexibility in the standard that you can do something completely illegal
> but have it work in 90% of cases, marmot should pick up on these.
> http://www.hlrs.de/organization/amt/projects/marmot/
>
> We could take this off-line if you prefer, this could potentially get
> quite involved...
>
> Ashley Pittman.
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to