Hi, Are you sure that you replicated your hostfile to all of your nodes?
Please can I see the output of your hosts file? Thanks, Andrew 2012/9/20 Antti Korhonen <akorho...@theranos.com>: > Hi Vincent > > Master works with all slaves. > M0+S1 works, M0+S2 works, M0+S3 works. > All nodes work fine as single nodes. > > Here is my start command (trying to use 3 nodes with 4 cores on each): > > Executing: > /mirror/OpenFOAM/ThirdParty-2.1.x/platforms/linux64Gcc/openmpi-1.5.3/ > bin/mpirun -np 12 -hostfile machines > /mirror/OpenFOAM/OpenFOAM-2.1.x/bin/foamExe > c -prefix /mirror/OpenFOAM interFoam -parallel | tee log > > I will search for node limitations in configs. > > Antti > > > -----Original Message----- > From: Vincent Diepeveen [mailto:d...@xs4all.nl] > Sent: Thursday, September 20, 2012 8:01 AM > To: Antti Korhonen > Cc: 'Jörg Saßmannshausen'; beowulf@beowulf.org > Subject: Re: [Beowulf] Cannot use more than two nodes on cluster > > Hi Antti, > > You describe just 1 master and 1 slave works. > Is it 1 specific slave that works and not the other slaves? > > So you have machines M0, S0,S1,S2 > > Is only M0 + S0 working and not M0+S1 nor M0+S2 ? > > What parallel shell are you using to start the jobs? > Is it the free pdsh? > > What command do you issue to start the jobs? > How many processes do you start at once and are the 3 slave nodes having the > same number of cores? > > Somewhere there must have a limit set; in most environments it's possible to > restrict users in how many processes they're allowed to execute > simultaneously. > > Maybe the default of the environment you use has this limit set to 2 nodes. > > What network is your cluster using? > > On Sep 20, 2012, at 4:37 PM, Antti Korhonen wrote: > >> I tested ssh with all combinations and that part is working as >> designed. >> >> I can start job manually on any single node. >> I can start jobs on any two nodes , as long as other node is master. >> All other combinations hang and jobs do not start. >> >> I read through few install guides and did not find any steps I missed. >> I am using Ubuntu 12.04, in case that makes any difference. >> >> Antti >> >> -----Original Message----- >> From: beowulf-boun...@beowulf.org [mailto:beowulf- >> boun...@beowulf.org] On Behalf Of Jörg Saßmannshausen >> Sent: Thursday, September 20, 2012 1:42 AM >> To: beowulf@beowulf.org >> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster >> >> Hi all, >> >> have you tried the following: ssh master -> node1 -> node2, i.e. >> ssh from the master to node1 and from there to node2? >> You do not have a situation where the remote host-key is not in the >> database and hence you get asked about adding that key to the local >> database? >> >> If that is working with all permutations, another possibility is that >> your host list is somehow messed up when you are submitting parallel >> jobs. Can you start the jobs manually by providing a host list to the >> MPI program you are using? Does that work or do you have problems here >> as well? >> >> My two pennies >> >> Jörg >> >> >> On Thursday 20 September 2012 07:40:56 Antti Korhonen wrote: >>> Passwordless SSH works between all nodes. >>> Firewalls are disabled. >>> >>> >>> From: g...@r-hpc.com [mailto:g...@r-hpc.com] On Behalf Of Greg Keller >>> Sent: Wednesday, September 19, 2012 8:43 PM >>> To: beowulf@beowulf.org; Antti Korhonen >>> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster >>> >>> I am going to bet $0.25 that SSH or TCP/IP is configured to allow the >>> master to get to the nodes without a password, but not from one >>> Compute to the other Compute. >>> >>> Test by sshing to Compute1, then from Compute1 to Compute2. >>> Depending >>> on how you built the cluster, it's also possible there is iptables >>> running on the compute nodes but, my money is on the ssh keys need >>> reconfiguring. >>> Let us know what you find. >>> >>> Cheers! >>> Greg >>> >>> Date: Wed, 19 Sep 2012 16:11:21 +0000 >>> From: Antti Korhonen >>> <akorho...@theranos.com<mailto:akorho...@theranos.com>> Subject: >>> [Beowulf] Cannot use more than two nodes on cluster >>> To: "beowulf@beowulf.org<mailto:beowulf@beowulf.org>" >>> <beowulf@beowulf.org<mailto:beowulf@beowulf.org>> Message-ID: >>> >>> <B9D51F953BEE5C42BC2B503D288542992DD935FE@SRW004PA.theranos.local<mai >>> l >>> to:B >>> 9D51F953BEE5C42BC2B503D288542992DD935FE@SRW004PA.theranos.local>> >>> Content-Type: text/plain; charset="us-ascii" >>> >>> Hello >>> >>> I have a small Beowulf cluster (master and 3 slaves). >>> I can run jobs on any single nodes. >>> Running on two nodes sort of works, running jobs on master and 1 >>> slave works. (all combos, master+slave1 or master+slave2 or >>> master+slave3) Running jobs on two slaves hangs. >>> Running jobs on master + any two slaves hangs. >>> >>> Would anybody have any troubleshooting tips? >> >> -- >> ************************************************************* >> Jörg Saßmannshausen >> University College London >> Department of Chemistry >> Gordon Street >> London >> WC1H 0AJ >> >> email: j.sassmannshau...@ucl.ac.uk >> web: http://sassy.formativ.net >> >> Please avoid sending me Word or PowerPoint attachments. >> See http://www.gnu.org/philosophy/no-word-attachments.html >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin >> Computing To change your subscription (digest mode or unsubscribe) >> visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf