Hi,

Am 21.10.2008 um 01:18 schrieb Luis Alejandro Del Castillo Riley:

hi fellows i have a cluster with 1 master 10 nodes with intel Xeon Quad core.
Fedora core 6
PGI 7.0-7
mpich 1.2.5.2

the last version of MPICH from 2005 is 1.2.7p1. For newer installations I would suggest to look into Open MPI.

machines.x86_64 with a 10 node names

Means only the 10 nodes?

when i try to run:
 mpirun -v -arch x86_64  -keep_pg -nolocal -np 9 mm5.mpp

i had no error but when a run with
 mpirun -v -arch x86_64  -keep_pg -nolocal -np 10 mm5.mpp

they take around 40 min to send me and error :
bm_list_4667: (1526.781250) wakeup_slave: unable to interrupt slave 0 pid 4666

With so many time, I would suggest to login to all nodes and check with:

$ ps -e f

(f w/o -) the ditribution and startup of the porcesses. Is it doing nothing for 40 minutes or running fine until it crashes?

-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to