Howdy! I apologize for posting this problem here, but I tried the LAM list and didn't hear anything, so I thought I would cast my net a bit wider in search of help. I'm having trouble starting an MPI code (NPB bt) that was built with PGI 6.1 and LAM-7.1.2. I get the following messages when I try to start the code (lamboot):
n-1<24201> ssi:boot:base:linear: booting n0 (n2004) n-1<24201> ssi:boot:base:linear: booting n1 (n2005) n-1<24201> ssi:boot:base:linear: booting n2 (n2006) n-1<24201> ssi:boot:base:linear: booting n3 (n2007) n-1<24201> ssi:boot:base:linear: booting n4 (n2008) n-1<24201> ssi:boot:base:linear: booting n5 (n2009) n-1<24201> ssi:boot:base:linear: booting n6 (n2010) n-1<24201> ssi:boot:base:linear: booting n7 (n2011) n-1<24201> ssi:boot:base:linear: finished ----------------------------------------------------------------------------- It seems that [at least] one of the processes that was started with mpirun chose a different RPI than its peers. For example, at least the following two processes mismatched in their RPI selections: MPI_COMM_WORLD rank 0: tcp (v7.1.0) MPI_COMM_WORLD rank 3: usysv (v7.1.0) All MPI processes must choose the same RPI module and version when they start. Check your SSI settings and/or the local environment variables on each node. I'm using PBS to start the job and here are the relevant parts of the script: NET=tcp lamboot -b -v -ssh rpi $NET $PBS_NODEFILE mpirun -O -v C ./${EXE} >> ${OUTFILE} lamhalt where $EXE and $OUTFILE are defined appropriately in the script. Does anyone have any ideas? TIA! Jeff _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf