On 2/21/08, Mark Hahn <[EMAIL PROTECTED]> wrote: > > > submit the jobs through a job scheduler (LSF in this case). We used the > > machinefile option with mpirun to order the nodes on which the processes > has > > to be started. > > > > But i am not able to do this with the current setup where LSF is used > for > > scheduling and SLURM for resource management. > > I have tried a few of the options like using the -m options to bsub for > > specifying the preference and so on. But of no success. > > this sounds like our HP-XC systems. but I'm a bit mystified: > you can get the node assignment from LSF, and then use srun -m hostfile > to force slurm to set up the rank-node mappings as you like. > (note: not -m to LSF.) did you try that? >
yes it is a HP-XC system and I have tried using -m option to srun also. *This is what I tried with a sample MPI Program that prints rank on node* *#include "stdio.h" #include "mpi.h"* *int main(int argc, char *argv[]) {* *int ierr,rank,size,len; char name[100];* *MPI_Init(&argc, &argv);* *MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Get_processor_name(name,&len);* *printf("This is %d out of %d: %s \n", rank,size,name); MPI_Finalize();* *return 0;* *}* This was submitted to LSF using * bsub -n 4 -e errfile -ext "SLURM[nodelist=n2,n1,n4,n3]" /opt/hpmpi/bin/mpirun -srun -m hostfile ./a.out* The environment variable SLURM_HOSTFILE was set to the hostfile with the nodes on which the binary had to be run in the order n2,n1,n4,n3. I got the following error in my error file: *a.out: MPI_Init: node to rank map is not correct myrank :0 mynode:1 a.out: MPI_Init: node to rank map is not correct myrank :1 mynode:0 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol a.out: MPI_Init: node to rank map is not correct myrank :3 mynode:2 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol a.out: MPI_Init: Cannot set srun startup protocol srun: error: n2: task0: Exited with exit code 1 a.out: MPI_Init: node to rank map is not correct myrank :2 mynode:3 a.out: MPI_Init: MPI_MPIRUN has wrong nodemap format a.out: MPI_Init: Cannot set srun startup protocol srun: Terminating job* -- Best Regards, Balamurugan. R
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf