Hi,

Am 22.02.2008 um 09:23 schrieb Sangamesh B:

Dear Reuti & members of beowulf,

I need to execute a parallel job thru grid engine.

MPICH2 is installed with Process Manager:mpd.

Added a parallel environment MPICH2 into SGE:

$ qconf -sp MPICH2
pe_name           MPICH2
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args /share/apps/MPICH2/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args    /share/apps/MPICH2/stopmpi.sh
allocation_rule   $pe_slots
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min


Added this PE to the default queue: all.q.

mpdboot is done. mpd's are running on two nodes.

The script for submitting this job thru sge  is:

$ cat subsamplempi.sh
#!/bin/bash

#$ -S /bin/bash

#$ -cwd

#$ -N Samplejob

#$ -q all.q

#$ -pe MPICH2 4

#$ -e ERR_$JOB_NAME.$JOB_ID

#$ -o OUT_$JOB_NAME.$JOB_ID

date

hostname

/opt/MPI_LIBS/MPICH2-GNU/bin/mpirun -np $NSLOTS -machinefile $TMP_DIR/machines ./samplempi

echo "Executed"

exit 0


The job is getting submitted, but not executing. The error and output file contain:

cat ERR_Samplejob.192
/usr/bin/env: python2.4: No such file or directory

$ cat OUT_Samplejob.192
-catch_rsh /opt/gridengine/default/spool/compute-0-0/active_jobs/ 192.1/pe_hostfile
compute-0-0
compute-0-0
compute-0-0
compute-0-0
Fri Feb 22 12:57:18 IST 2008
compute-0-0.local
Executed

So the problem is coming for python2.4.

$ which python2.4
/opt/rocks/bin/python2.4

I googled this error. Then created a symbolic link:

# ln -sf /opt/rocks/bin/python2.4 /bin/python2.4

After this also same error is coming.

I guess the problem might be different. i.e. gridengine might not getting the link to running mpd.

And the procedure followed by me to configure PE might be wrong.

So, I expect from you to clear my doubts and help me to resolve this error.

1. Is the PE configuration of MPICH2 + grid engine right?

if you want to integrate MPICH2 with MPD it's similar to a PVM setup. The daemons must be started in start_proc_args on every node with a dedicated port number per job. You don't say what your startmpi.sh is doing.

2. Without Tight integration, is there a way to run a MPICh2(mpd) based job using gridengine?

Yes.

3. In smpd-daemon based and daemonless MPICH2 tight integration, which one is better?

Depends: if you have just one mpirun per job which will run for days, I would go for the daemonless startup. But if you issue many mpirun calls in your jobscript which will just run for seconds I would go for the daemon based startup, as the mpirun will be distributed to the slaves faster.

4. Can we do mvapich2 tight integration with SGE? Any differences with process managers wrt MVAPICH2?

Maybe, if the startup is similar to standard MPICH2.

-- Reuti


Thanks & Best Regards,
Sangamesh B

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to