Hi Vincent

Master works with all slaves. 
M0+S1 works, M0+S2 works, M0+S3 works.
All nodes work fine as single nodes.

Here is my start command (trying to use 3 nodes with 4 cores on each):

Executing: /mirror/OpenFOAM/ThirdParty-2.1.x/platforms/linux64Gcc/openmpi-1.5.3/
bin/mpirun -np 12 -hostfile machines /mirror/OpenFOAM/OpenFOAM-2.1.x/bin/foamExe
c -prefix /mirror/OpenFOAM interFoam -parallel | tee log

I will search for node limitations in configs.

  Antti


-----Original Message-----
From: Vincent Diepeveen [mailto:d...@xs4all.nl] 
Sent: Thursday, September 20, 2012 8:01 AM
To: Antti Korhonen
Cc: 'Jörg Saßmannshausen'; beowulf@beowulf.org
Subject: Re: [Beowulf] Cannot use more than two nodes on cluster

Hi Antti,

You describe just 1 master and 1 slave works.
Is it 1 specific slave that works and not the other slaves?

So you have machines M0, S0,S1,S2

Is only M0 + S0 working and not M0+S1 nor M0+S2 ?

What parallel shell are you using to start the jobs?
Is it the free pdsh?

What command do you issue to start the jobs?
How many processes do you start at once and are the 3 slave nodes having the 
same number of cores?

Somewhere there must have a limit set; in most environments it's possible to 
restrict users in how many processes they're allowed to execute simultaneously.

Maybe the default of the environment you use has this limit set to 2 nodes.

What network is your cluster using?

On Sep 20, 2012, at 4:37 PM, Antti Korhonen wrote:

> I tested ssh with all combinations and that part is working as 
> designed.
>
> I can start job manually on any single node.
> I can start jobs on any two  nodes , as long as other node is master.
> All other combinations hang  and jobs do not start.
>
> I read through few install guides and did not find any steps I missed.
> I am using Ubuntu 12.04, in case that makes any difference.
>
>   Antti
>
> -----Original Message-----
> From: beowulf-boun...@beowulf.org [mailto:beowulf- 
> boun...@beowulf.org] On Behalf Of Jörg Saßmannshausen
> Sent: Thursday, September 20, 2012 1:42 AM
> To: beowulf@beowulf.org
> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster
>
> Hi all,
>
> have you tried the following: ssh master -> node1 -> node2, i.e.  
> ssh from the master to node1 and from there to node2?
> You do not have a situation where the remote host-key is not in the 
> database and hence you get asked about adding that key to the local 
> database?
>
> If that is working with all permutations, another possibility is that 
> your host list is somehow messed up when you are submitting parallel 
> jobs. Can you start the jobs manually by providing a host list to the 
> MPI program you are using? Does that work or do you have problems here 
> as well?
>
> My two pennies
>
> Jörg
>
>
> On Thursday 20 September 2012 07:40:56 Antti Korhonen wrote:
>> Passwordless SSH works between all nodes.
>> Firewalls are disabled.
>>
>>
>> From: g...@r-hpc.com [mailto:g...@r-hpc.com] On Behalf Of Greg Keller
>> Sent: Wednesday, September 19, 2012 8:43 PM
>> To: beowulf@beowulf.org; Antti Korhonen
>> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster
>>
>> I am going to bet $0.25 that SSH or TCP/IP is configured to allow the 
>> master to get to the nodes without a password, but not from one 
>> Compute to the other Compute.
>>
>> Test by sshing to Compute1, then from Compute1 to Compute2.   
>> Depending
>> on how you built the cluster, it's also possible there is iptables 
>> running on the compute nodes but, my money is on the ssh keys need 
>> reconfiguring.
>> Let us know what you find.
>>
>> Cheers!
>> Greg
>>
>> Date: Wed, 19 Sep 2012 16:11:21 +0000
>> From: Antti Korhonen
>> <akorho...@theranos.com<mailto:akorho...@theranos.com>> Subject:
>> [Beowulf] Cannot use more than two nodes on cluster
>> To: "beowulf@beowulf.org<mailto:beowulf@beowulf.org>"
>> <beowulf@beowulf.org<mailto:beowulf@beowulf.org>> Message-ID:
>>
>> <B9D51F953BEE5C42BC2B503D288542992DD935FE@SRW004PA.theranos.local<mai
>> l
>> to:B
>> 9D51F953BEE5C42BC2B503D288542992DD935FE@SRW004PA.theranos.local>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Hello
>>
>> I have a small Beowulf cluster (master and 3 slaves).
>> I can run jobs on any single nodes.
>> Running on two nodes sort of works, running jobs on master and 1 
>> slave works. (all combos, master+slave1 or master+slave2 or 
>> master+slave3) Running jobs on two slaves hangs.
>> Running jobs on master + any two slaves hangs.
>>
>> Would anybody have any troubleshooting tips?
>
> --
> *************************************************************
> Jörg Saßmannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ
>
> email: j.sassmannshau...@ucl.ac.uk
> web: http://sassy.formativ.net
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
>
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
> Computing To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin 
> Computing To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to