One of the issues I had with using mpd with Sun Grid Engine was that if it starts the mpd daemons on a per job basis, you can run into problems with the scheduler starts another mpd on the same node for a different MPI job.
The solution I used (as described by the SGE integration docs) was to use the spmd with a unique port number. This is similar to using LAM_MPI_SOCKET_SUFFIX for lam runs. I am not sure how well spmd startup and shutdown scales, I have only used it with small numbers of nodes (< 100) -- Doug > If you have a batch system that can start the MPDs, you should > consider starting the MPI processes directly with the batch system > and providing a separate service to provide the startup information. > In MPICH2, the MPI implementation is separated from the process > management. The mpd system is simply an example process manager > (albeit one with many useful features). We didn't expect users to > use one existing parallel process management system to start another > one; instead, we expected that those existing systems would use the > PMI interface used in MPICH2 to directly start the MPI processes. I > know that you don't need MPD for MPICH2; I expect the same is true > for Intel MPI. > > Bill > > On Oct 4, 2006, at 11:31 AM, Bill Bryce wrote: > >> Hi Matt, >> >> You pretty much diagnosed our problem correctly. After discussing >> with >> the customer and a few more engineers here we found that the python >> code >> was very slow at starting the ring. Seems to be a common problem with >> MPD startup on other MPI implementations as well (I could be wrong >> though). We also modified the recvTimeout since onsite engineers >> suspected that would help as well. The final fix we are working on is >> starting the MPD with the batch system and not relying on ssh - the >> customer does not want a root MPD ring and wants one per job so the >> batch system will do this for us. >> >> Bill. >> >> >> -----Original Message----- >> From: M J Harvey [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, October 04, 2006 12:23 PM >> To: Bill Bryce >> Cc: beowulf@beowulf.org >> Subject: Re: [Beowulf] Intel MPI 2.0 mpdboot and large clusters, slow >> tostart up, sometimes not at all >> >> Hello, >> >>> We are going through a similar experience at one of our customer >> sites. >>> They are trying to run Intel MPI on more than 1,000 nodes. Are you >>> experiencing problems starting the MPD ring? We noticed it takes a >>> really long time especially when the node count is large. It also >> just >>> doesn't work sometimes. >> >> I've had similar problems with slow and unreliable startup of the >> Intel >> mpd ring. I noticed that before spawning the individual mpds, it >> connects to each node and checks the version of the installed python >> (function getversionpython() in mpdboot.py). On my cluster, at least, >> this check was very slow (not to say pointless). Removing it >> dramatically improved startup time - now it's merely slow. >> >> Also, for jobs with large process counts, it's worth increasing >> recvTimeout in mpirun from 20 seconds. This value governs the >> amount of >> time mpirun waits for the secondary mpi processes to be spawned by the >> remote mpds and the default value is much too aggressive for large >> jobs >> started via ssh. >> >> Kind Regards, >> >> Matt >> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf