Re: [Beowulf] Network problem: Why are ARP discovery requests sent to specific addresses instead of a broadcast domain

2010-07-13 Thread Beat Rubischon
Hello! Quoting (13.07.10 22:32): > It is curious that most of my gratuitous ARP is coming > from my IPMI interface and not my main eth stack. Not sure why. Maybe > the Dell IPMI just is more aggressive about it. This behavour could be controlled by some flags in the BMC: # ipmitool lan set 1 a

Re: [Beowulf] Re: Beowulf Digest, Vol 77, Issue 14

2010-07-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/07/10 01:57, Douglas Guptill wrote: > On Tue, Jul 13, 2010 at 05:11:46PM +0200, Ivan Rossi wrote: [...] >>> >> + MPI (Intel MPI) - for the application >> > >> > only if you use intel compilers, otherwise go openMPI > > Yes, we will be usi

Re: [Beowulf] first cluster

2010-07-13 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13/07/10 22:07, Reuti wrote: > Disadvantage is of course, when the system runs out of > memory the oom-killer will look for an eligible process > to be killed to free up some space. That assumes that you are permitting your compute nodes to overco

Re: [Beowulf] first cluster [was [OMPI users] trouble using openmpi under slurm]

2010-07-13 Thread Douglas Guptill
Hello Gus, list: On Fri, Jul 09, 2010 at 07:06:05PM -0400, Gus Correa wrote: > Douglas Guptill wrote: >> On Thu, Jul 08, 2010 at 09:43:48AM -0400, Gus Correa wrote: >>> Douglas Guptill wrote: On Wed, Jul 07, 2010 at 12:37:54PM -0600, Ralph Castain wrote: > Noafraid not. Things wo

Re: [Beowulf] IB problem with openmpi 1.2.8

2010-07-13 Thread Bill Wichser
On 7/13/2010 4:50 PM, Prentice Bisbal wrote: Bill, Have you checked the health of the cables themselves? It could just be dumb luck that a hardware failure coincided with a software change, didn't manifest itself until the reboot of the nodes. Did you reboot the switches, too? Just looked a

Re: [Beowulf] IB problem with openmpi 1.2.8

2010-07-13 Thread Prentice Bisbal
Bill, Have you checked the health of the cables themselves? It could just be dumb luck that a hardware failure coincided with a software change, didn't manifest itself until the reboot of the nodes. Did you reboot the switches, too? I would try dividing your cluster into small sections and see if

Re: [Beowulf] Network problem: Why are ARP discovery requests sent to specific addresses instead of a broadcast domain

2010-07-13 Thread Rahul Nabar
On Tue, Jul 13, 2010 at 12:04 AM, Tom Ammon wrote: > This is called a gratuitous ARP. Used to update the ARP caches of other > nodes. Thanks Tom. It is curious that most of my gratuitous ARP is coming from my IPMI interface and not my main eth stack. Not sure why. Maybe the Dell IPMI just is more

Re: [Beowulf] IB problem with openmpi 1.2.8

2010-07-13 Thread Bill Wichser
Just some more info. Went back to the prior kernel with no luck. Updated the firmware on the Topspin HBA cards to the latest (final) version (fw-25208-4_8_200-MHEL-CF128-T).Nothing changes. Still not sure where to look. Bill Wichser wrote: Machine is an older Intel Woodcrest cluster wi

[Beowulf] Re: Beowulf Digest, Vol 77, Issue 14

2010-07-13 Thread Douglas Guptill
On Tue, Jul 13, 2010 at 05:11:46PM +0200, Ivan Rossi wrote: > On Fri, 9 Jul 2010, beowulf-requ...@beowulf.org wrote: > >> After some lurking and reading, I plan this: >> Debian (lenny) >> + fai - for compute-node operating system install >> + Torque- job schedul

Re: [Beowulf] first cluster

2010-07-13 Thread Glen Beane
On 7/13/10 12:29 AM, "Rahul Nabar" wrote: > On Mon, Jul 12, 2010 at 2:02 PM, Gus Correa wrote: >> Consider disk for: >> >> A) swap space (say, if the user programs are large, >> or you can't buy a lot of RAM, etc); > > Out of curiosity, is there the possibility of running a "swapless" > com

Re: [Beowulf] first cluster

2010-07-13 Thread Reuti
Am 13.07.2010 um 06:29 schrieb Rahul Nabar: > On Mon, Jul 12, 2010 at 2:02 PM, Gus Correa wrote: >> Consider disk for: >> >> A) swap space (say, if the user programs are large, >> or you can't buy a lot of RAM, etc); > > Out of curiosity, is there the possibility of running a "swapless" > compu