Hi all, I'm doing the Tight MPICH2 (not MPICH) Integration with SGE on a cluster with, dual core dual AMD64 opteron processor.
Followed the sun document located at: http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html The document explains following three kinds of TI: Tight Integration(TI) using Process Manager(PM): gforker TI using PM: SMPD – Daemonless TI using PM: SMPD – Daemonbased I did the TI with gforker and tested it successfully. But failed to do TI with daemonless-SMPD. Let me explain what I did. Installed the MPICH2 with smpd configuration. The sge is installed at: /opt/gridengine And created MPICH2-SM folder in /opt/gridengine/mpi by referring the following lines from the document start_proc_args /usr/sge/mpich2_smpd_rsh/startmpich2.sh -catch_rsh $pe_hostfile stop_proc_args /usr/sge/mpich2_smpd_rsh/stopmpich2.sh Copied the startmpi.sh, stopmpi.sh from /opt/gridengine/mpi to /opt/gridengine/mpi/MPICH2-SM dir, because nothing has given in the doc what to include in these scripts. Using qmon, created MPICH2-GF pe. # qconf -sp MPICH2-SM pe_name MPICH2-SM slots 999 user_lists rootuserset xuser_lists NONE start_proc_args /opt/gridengine/mpi/MPICH2-SM/startmpich2sm.sh stop_proc_args /opt/gridengine/mpi/MPICH2-SM/stopmpich2sm.sh allocation_rule $round_robin control_slaves FALSE job_is_first_task TRUE urgency_slots min Added this PE to default queue all.q. Then submitted the job with following script: # cat sgeSM.sh #!/bin/sh #$ -cwd #$ -pe MPICH2-SM 4 #$ -e msge2.Err #$ -o msge2.out #$ -v MPI_HOME=/opt/MPI_LIBS/MPICH2-GNU/MPICH2-SM/bin #$ -v MEME_DIRECTORY=/opt/MEME-MAX $MPI_HOME/mpiexec -np 4 -machinefile /root/MFM /opt/MEME-MAX/bin/meme_p /opt/MEME-MAX/NCCS/samevivo_sample.txt -dna -mod tcm -nmotifs 10 -nsites 100 -minw 5 -maxw 50 -revcomp -text -maxsize 200500 It gave following error: # cat msge2.Err startmpich2sm.sh: got wrong number of arguments rm: cannot remove `/tmp/92.1.all.q/machines': No such file or directory rm: cannot remove `/tmp/92.1.all.q/rsh': No such file or directory I guess the problem might be with the scripts startmpich2sm.sh and stopmpich2sm.sh. Can any one guide me to resolve this issue.. Thanks & Regards, Sangamesh HPC Engineer
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf