Hi,
Am 21.10.2008 um 01:18 schrieb Luis Alejandro Del Castillo Riley:
hi fellows i have a cluster with 1 master 10 nodes with intel Xeon
Quad core.
Fedora core 6
PGI 7.0-7
mpich 1.2.5.2
the last version of MPICH from 2005 is 1.2.7p1. For newer
installations I would suggest to look into Open MPI.
machines.x86_64 with a 10 node names
Means only the 10 nodes?
when i try to run:
mpirun -v -arch x86_64 -keep_pg -nolocal -np 9 mm5.mpp
i had no error but when a run with
mpirun -v -arch x86_64 -keep_pg -nolocal -np 10 mm5.mpp
they take around 40 min to send me and error :
bm_list_4667: (1526.781250) wakeup_slave: unable to interrupt slave
0 pid 4666
With so many time, I would suggest to login to all nodes and check with:
$ ps -e f
(f w/o -) the ditribution and startup of the porcesses. Is it doing
nothing for 40 minutes or running fine until it crashes?
-- Reuti
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf