Try disabling shared memory only. Open MPI shared memory buffer is limited and it enters deadlock if you overflow it. As Open MPI uses busy wait, it appears as a livelock.
2008/7/9 Ashley Pittman <[EMAIL PROTECTED]>: > On Tue, 2008-07-08 at 22:01 -0400, Joe Landman wrote: > > Short version: The code starts and runs. Reads in its data. Starts > > its iterations. And then somewhere after this, it hangs. But not > > always at the same place. It doesn't write state data back out to the > > disk, just logs. Rerunning it gets it to a different point, sometimes > > hanging sooner, sometimes later. Seems to be the case on multiple > > different machines, with different OSes. Working on comparing MPI > > distributions, and it hangs with IB as well as with shared memory and > > tcp sockets. > > Sounds like you've found a bug, doesn't sound too difficult to find, > comments in-line. > > > Right now we are using OpenMPI 1.2.6, and this code does use > > allreduce. When it hangs, an strace of the master process shows lots of > > polling: > > Why do you mention allreduce, does it tend to be in allreduce when it > hangs? Is it happening at the same place but on a different iteration > every time perhaps? This is quite important, you could either have a > "random" memory corruption which can cause the program to stop anywhere > and are often hard to find or a race condition which is easier to deal > with, if there are any similarities in the stack then it tends to point > to the latter. > > allreduce is one of the collective functions with an implicit barrier > which means that *no* process can return from it until *all* processes > have called it, if you program uses allreduce extensively it's entirely > possible that one process has stopped for whatever reason and have the > rest continued as far as they can until they too deadlock. Collectives > often get accused of causing programs to hang when in reality N-1 > processes are in the collective call and 1 is off somewhere else. > > > c1-1:~ # strace -p 8548 > > > [spin forever] > > Any chance of a stack trace, preferably a parallel one? I assume *all* > processes in the job are in the R state? Do you have a mechanism > available to allow you to see the message queues? > > > So it looks like the process is waiting for the appropriate posting on > > the internal scoreboard, and just hanging in a tight loop until this > > actually happens. > > > > But these hangs usually happen at the same place each time for a logic > > error. > > Like in allreduce you mean? > > > But the odd thing about this code is that it worked fine 12 - 18 months > > ago, and we haven't touched it since (nor has it changed). What has > > changed is that we are now using OpenMPI 1.2.6. > > The other important thing to know here is what you have changed *from*. > > > So the code hasn't changed, and the OS on which it runs hasn't changed, > > but the MPI stack has. Yeah, thats a clue. > > > Turning off openib and tcp doesn't make a great deal of impact. This is > > also a clue. > > So it's likely algorithmic? You could turn off shared memory as well > but it won't make a great deal of impact so there isn't any point. > > > I am looking now to trying mvapich2 and seeing how that goes. Using > > Intel and gfortran compilers (Fortran/C mixed code). > > > > Anyone see strange things like this with their MPI stacks? > > All the time, it's not really strange, just what happens on large > systems, expecially when developing MPI or applications. > > > I'll try all the usual things (reduce the optimization level, etc). > > Sage words of advice (and clue sticks) welcome. > > Is it the application which hangs or a combination of the application > and the dataset you give it? What's the smallest process count and > timescale you can reproduce this on? > > You could try valgrind which works well with openmpi, it will help you > with memory corruption but won't help be of much help if you have a race > condition. Going by reputation Marmot might be of some use, it'll point > out if you are doing anything silly with MPI calls, there is enough > flexibility in the standard that you can do something completely illegal > but have it work in 90% of cases, marmot should pick up on these. > http://www.hlrs.de/organization/amt/projects/marmot/ > > We could take this off-line if you prefer, this could potentially get > quite involved... > > Ashley Pittman. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf