Re: [Beowulf] Pretty High Performance Computing

2008-09-23 Thread Ellis Wilson
I guess I don't quite understand why you disagree Prentice. With the exception that middleware doesn't strive to be a classification per se, just a solution, it still consists of a "style of computing where you sacrifice absolute high performance because of issues relating to any combination o

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Eric Thibodeau
Ashley Pittman wrote: On Mon, 2008-09-22 at 15:44 -0400, Eric Thibodeau wrote: Ashley Pittman wrote: On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote: If it were up to me I'd turn *everything* possible off except sshd and ntp. The problem however is the maintenance cost

Re: [Beowulf] shmem

2008-09-23 Thread Patrick Geoffray
Lawrence Stewart wrote: Is anyone aware of available test suites or API benchmark suites for shmem? I am thinking of the equivalent of the Intel MPI tests or Intel MPI Benchmarks, awful though they are. I don't know any publicly available shmem validation or benchmark suites. Not surprising s

[Beowulf] shmem

2008-09-23 Thread Lawrence Stewart
I'm starting work on a shmem implementation for the SiCortex systems. Is anyone aware of available test suites or API benchmark suites for shmem? I am thinking of the equivalent of the Intel MPI tests or Intel MPI Benchmarks, awful though they are. Separately, does anyone here happen to know wh

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Patrick Geoffray
Perry E. Metzger wrote: You realize that most big HPC systems are using interconnects that don't generate many or any interrupts, right? Of course. Usually one even uses interrupt pacing/mitigation even in gig ethernet on a modern machine -- otherwise you're not going to get reasonable performa

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Lawrence Stewart
On Sep 23, 2008, at 9:18 PM, Perry E. Metzger wrote: Greg Lindahl <[EMAIL PROTECTED]> writes: On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote: As for the daemons, remember that with a proper scheduler, you will switch straight from an incoming network interrupt to a high p

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Prentice Bisbal
Prentice Bisbal wrote: Prentice Bisbal wrote: The more services you run on your cluster node (gmond, sendmail, etc.) the less performance is available for number crunching, but at the same time, administration difficulty increases. For example, if you turn off postfix/sendmail, you'll no longer

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Prentice Bisbal
Prentice Bisbal wrote: The more services you run on your cluster node (gmond, sendmail, etc.) the less performance is available for number crunching, but at the same time, administration difficulty increases. For example, if you turn off postfix/sendmail, you'll no longer get automated e-mails fr

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Perry E. Metzger
Greg Lindahl <[EMAIL PROTECTED]> writes: > On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote: >> As for the daemons, remember that with a proper scheduler, you will >> switch straight from an incoming network interrupt to a high priority >> process that is expecting the incoming pac

Re: [Beowulf] Pretty High Performance Computing

2008-09-23 Thread Prentice Bisbal
Vincent Diepeveen wrote: I'd argue we might know this already as middleware. That makes absolutely no sense. Best regards from a hotel in Beijing, Vincent On Sep 23, 2008, at 10:32 PM, Jon Forrest wrote: Given the recent discussion of whether running multiple services and other such things

Re: [Beowulf] One network, or two?

2008-09-23 Thread Matt Lawrence
On Tue, 23 Sep 2008, Alan Ward wrote: I have been reading the ongoing discussion on network usage with some interest, mainly because in all (admittedly very small, 4 to 8 node) clusters we have set up so far, we have always gone with doubling the network. Nowadays we mostly run a 100 MBit/s "e

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Greg Lindahl
On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote: > As for the daemons, remember that with a proper scheduler, you will > switch straight from an incoming network interrupt to a high priority > process that is expecting the incoming packet, and that even works > correctly on some (

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Perry E. Metzger
Patrick Geoffray <[EMAIL PROTECTED]> writes: > Perry E. Metzger wrote: >> from processing interrupts, or prevent your OS from properly switching >> to a high priority process following an interrupt, but SMM will and >> you can't get rid of it. > > You can usually disable SMI, either through the BI

Re: [Beowulf] Pretty High Performance Computing

2008-09-23 Thread Vincent Diepeveen
I'd argue we might know this already as middleware. Best regards from a hotel in Beijing, Vincent On Sep 23, 2008, at 10:32 PM, Jon Forrest wrote: Given the recent discussion of whether running multiple services and other such things affects the running of a cluster, I'd like to propose a new

Re: [Beowulf] One network, or two?

2008-09-23 Thread Joe Landman
Prentice Bisbal wrote: Alan Ward wrote: Good day. I have been reading the ongoing discussion on network usage with some interest, mainly because in all (admittedly very small, 4 to 8 node) clusters we have set up so far, we have always gone with doubling the network. Nowadays we mostly run a

Re: [Beowulf] One network, or two?

2008-09-23 Thread Buccaneer for Hire.
--- On Tue, 9/23/08, Greg Lindahl <[EMAIL PROTECTED]> wrote: > From: Greg Lindahl <[EMAIL PROTECTED]> > Subject: Re: [Beowulf] One network, or two? > To: beowulf@beowulf.org > Date: Tuesday, September 23, 2008, 1:20 PM > On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua > Baker-LePain wrote: > > >

Re: [Beowulf] One network, or two?

2008-09-23 Thread Joshua Baker-LePain
On Tue, 23 Sep 2008 at 1:20pm, Greg Lindahl wrote On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua Baker-LePain wrote: In theory, it's a great idea. In practice, wiring up an entire 2nd network gets to be a major PITA (as well as being uncheap) as your cluster size increases. I've got ~350 n

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Patrick Geoffray
Perry E. Metzger wrote: from processing interrupts, or prevent your OS from properly switching to a high priority process following an interrupt, but SMM will and you can't get rid of it. You can usually disable SMI, either through the BIOS or directly from the chipset. However, you will lose

[Beowulf] Pretty High Performance Computing

2008-09-23 Thread Jon Forrest
Given the recent discussion of whether running multiple services and other such things affects the running of a cluster, I'd like to propose a new classification of computing. I call this Pretty High Performance Computing (PHPC). This is a style of computing where you sacrifice absolute high perf

Re: [Beowulf] One network, or two?

2008-09-23 Thread Tony Travis
Prentice Bisbal wrote: [...] My new cluster, which is still in labor, will have InfiniBand for MPI, and we have 10 Gb ethernet switches for management/NFS, etc. The nodes only have 1 Gb ethernet, so it will be effectively a 1 Gb network. I'm also curious as to whether the dual networks are overk

Re: [Beowulf] One network, or two?

2008-09-23 Thread Greg Lindahl
On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua Baker-LePain wrote: > In theory, it's a great idea. In practice, wiring up an entire 2nd > network gets to be a major PITA (as well as being uncheap) as your > cluster size increases. I've got ~350 nodes here, and a 2nd set of > cabling is rea

Re: [Beowulf] A couple of interesting comments

2008-09-23 Thread Prentice Bisbal
Oops. e-mailed to the wrong address. The cat's out of the bag now! No big deal. I was 50/50 about CC-ing the list, anyway. Just remove the phrase "off-list" in the first sentence, and that last bit about not posting to the list because... Great. I'll never get a job that requires security clearan

[Beowulf] A couple of interesting comments

2008-09-23 Thread Prentice Bisbal
Gerry, I wanted to let you know off-list that I'm going through the same problems right now. I thought you'd like to know you're not alone. We purchased a cluster from the *allegedly* same vendor. The PXE boot and keyboard errors were the least of our problems. First, our cluster was delayed 2 m

Re: [Beowulf] One network, or two?

2008-09-23 Thread Prentice Bisbal
Alan Ward wrote: > > Good day. > > I have been reading the ongoing discussion on network usage with some > interest, mainly because in all (admittedly very small, 4 to 8 node) > clusters we have set up so far, we have always gone with doubling the > network. Nowadays we mostly run a 100 MBit/s "e

Re: [Beowulf] One network, or two?

2008-09-23 Thread Joshua Baker-LePain
On Tue, 23 Sep 2008 at 9:16pm, Alan Ward wrote I have been reading the ongoing discussion on network usage with some interest, mainly because in all (admittedly very small, 4 to 8 node) clusters we have set up so far, we have always gone with doubling the network. Nowadays we mostly run a 100

[Beowulf] One network, or two?

2008-09-23 Thread Alan Ward
Good day. I have been reading the ongoing discussion on network usage with some interest, mainly because in all (admittedly very small, 4 to 8 node) clusters we have set up so far, we have always gone with doubling the network. Nowadays we mostly run a 100 MBit/s "el cheapo" FastEthernet for c

RE: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Lux, James P
> > XML is evil. Well, evil for this. > > Ganglia does consume a significant portion of resources. > I've heard first-hand reports of 20% CPU. (Admittedly before > they figured out what was happening and turned the reporting > list and frequency way down.) > > When we found that we needed to sup

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Paul Van Allsburg
Donald Becker wrote: On Mon, 22 Sep 2008, Perry E. Metzger wrote: Prentice Bisbal <[EMAIL PROTECTED]> writes: The more services you run on your cluster node (gmond, sendmail, etc.) the less performance is available for number crunching, but at the same time, administration difficulty i

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Bernard Li
Hi Matt: On Mon, Sep 22, 2008 at 9:26 PM, Matt Lawrence <[EMAIL PROTECTED]> wrote: > Well, the folks I talked to at TACC were not enthusiastic about the amount > of resources ganglia uses. I will agree that there is a lot of unecessary > stuff that goes on, like converting everything to and from

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Bernard Li
Hi Joe: On Tue, Sep 23, 2008 at 4:50 AM, Joe Landman <[EMAIL PROTECTED]> wrote: > some overhead. And occasionally gmond wanders off into a different universe > ... Care to elaborate what this means? Perhaps I can help :-) Cheers, Bernard ___ Beowul

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Bernard Li
Hi John: On Tue, Sep 23, 2008 at 4:54 AM, John Hearns <[EMAIL PROTECTED]> wrote: > That's a reason why I'm no great lover of Ganglia too - it just sprays > multicast packets all over your network. > Which really should be OK - but if you have switches which don't perform > well with multicast you

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Donald Becker
On Mon, 22 Sep 2008, Perry E. Metzger wrote: > > Prentice Bisbal <[EMAIL PROTECTED]> writes: > > The more services you run on your cluster node (gmond, sendmail, etc.) > > the less performance is available for number crunching, but at the same > > time, administration difficulty increases. For ex

[Beowulf] SCore MPI on quad core

2008-09-23 Thread Dave Love
Has anyone tried -- or, better, succeeded with -- SCore MPI on quad core systems, specifically Barcelonas? It worked fine before they were upgraded from 2×2 Opteron cores to 2×4, but now it's performing much worse than other MPIs (with gig ethernet). I'm not sure exactly what the measurements inv

Re: [Beowulf] What services do you run on your cluster nodes?y

2008-09-23 Thread Robert G. Brown
On Tue, 23 Sep 2008, Perry E. Metzger wrote: "Robert G. Brown" <[EMAIL PROTECTED]> writes: You can run xmlsysd as either an xinetd process or forking daemon (the former is more secure, perhaps, the latter makes it stand alone and keeps one from having to run xinetd:-). Arguably, running proc

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Donald Becker
On Tue, 23 Sep 2008, Robert G. Brown wrote: > On Mon, 22 Sep 2008, Matt Lawrence wrote: > > On Mon, 22 Sep 2008, Bernard Li wrote: > > > >> Ganglia collects metrics from hosts and trends them for the user. > >> Most of these metrics need to be collected from the host itself (CPU, > >> memory, load,

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Robert G. Brown
On Tue, 23 Sep 2008, John Hearns wrote: 2008/9/23 Robert G. Brown <[EMAIL PROTECTED]> This meant that there could be hundreds or even thousands of machines that saw every packet produced by every other machine on the LAN, possibly after a few ethernet bridge hops. This made conditions ripe f

[Beowulf] looper

2008-09-23 Thread Dan Stromberg
I was asked by my employer to publish this a bit ago, so here it is: http://stromberg.dnsalias.org/~dstromberg/looper/ It's a multithreaded script for running n POSIX shell commands m at a time with good error checking. It allows for things like stashing in ssh $hostname or rsync $hostname in

Re: [Beowulf] RE: MS Cray

2008-09-23 Thread Naveed Near-Ansari
On Sep 17, 2008, at 11:39 AM, Lux, James P wrote: A big disadvantage of in-house is that it leads to an inventory of ancient gear that becomes hard to maintain, and balkanized ownership (we bought that for Project X, and though Project X is long gone, the former staff of Project X still

Roll your own cluster management system with ClusterVisionOS v4 - was: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread andrew holway
This is a bit of a shameless plug but as were on the subject. We are just in the process of releasing our new stack, ClusterVisionOS v4. We stick it on top of your favorite red hat clone(mostly SL) to provide a host of handy cluster tools and enhancements. From November/December(ish) you will get

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Perry E. Metzger
"Robert G. Brown" <[EMAIL PROTECTED]> writes: > You can run xmlsysd as either an xinetd process or forking daemon > (the former is more secure, perhaps, the latter makes it stand alone > and keeps one from having to run xinetd:-). Arguably, running processes under inetd can make them more secure,

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Robert G. Brown
On Tue, 23 Sep 2008, Joe Landman wrote: Robert G. Brown wrote: One can always run xmlsysd instead, which is a very lightweight on-demand information service. It costs you, basically, a socket, and you can poll the nodes to get their current runstate every five seconds, every thirty seconds, e

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Perry E. Metzger
Ashley Pittman <[EMAIL PROTECTED]> writes: > Note that it's not just the "OS" fluff which causes problems and turning > things off doesn't get you anything like 100% of the way there, some > deamons have to run (your job scheduler) so all you can do it tune them > or re-code them to use less CPU a

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Prentice Bisbal
Gerry Creager wrote: > Eric Thibodeau wrote: >> Prentice Bisbal wrote: >>> The more services you run on your cluster node (gmond, sendmail, etc.) >>> the less performance is available for number crunching, but at the same >>> time, administration difficulty increases. For example, if you turn off

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Joe Landman
John Hearns wrote: That's a reason why I'm no great lover of Ganglia too - it just sprays multicast packets all over your network. Yeah ... try to use a network in the middle of a multicast storm. As I remember, every machine seeing a multicast packet has to at least inspect the packet to s

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Ashley Pittman
On Mon, 2008-09-22 at 15:52 -0700, Bernard Li wrote: > Or you can re-post with a different topic here, and I'll try my best > to answer them :-) We collect a half dozen or so simple metrics per node and add them to Ganglia using "gmetric". A naive implementation of this with 32 nodes each reporti

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread John Hearns
2008/9/23 Robert G. Brown <[EMAIL PROTECTED]> > > This meant that there could be hundreds or even thousands of machines > that saw every packet produced by every other machine on the LAN, > possibly after a few ethernet bridge hops. This made conditions ripe > for what used to be called a "packet

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Joe Landman
Robert G. Brown wrote: One can always run xmlsysd instead, which is a very lightweight on-demand information service. It costs you, basically, a socket, and you can poll the nodes to get their current runstate every five seconds, every thirty seconds, every minute, every five minutes. Pick a g

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Perry E. Metzger
Greg Lindahl <[EMAIL PROTECTED]> writes: >> By the way, if you really can't afford for things to "go away" for >> 1/250th of a second very often, I have horrible news for you: NO >> COMPUTER WILL WORK FOR YOU. > > You haven't done much HPC, have you? Why do you think we build > interconnects with

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Robert G. Brown
On Tue, 23 Sep 2008, Ashley Pittman wrote: On Mon, 2008-09-22 at 20:54 -0400, Perry E. Metzger wrote: By the way, if you really can't afford for things to "go away" for 1/250th of a second very often, I have horrible news for you: NO COMPUTER WILL WORK FOR YOU. To a large extent you are actu

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Robert G. Brown
On Mon, 22 Sep 2008, Matt Lawrence wrote: On Mon, 22 Sep 2008, Bernard Li wrote: Ganglia collects metrics from hosts and trends them for the user. Most of these metrics need to be collected from the host itself (CPU, memory, load, etc.). Besides, the footprint of Ganglia is very little. I ha

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Robert G. Brown
On Mon, 22 Sep 2008, Joe Landman wrote: Prentice Bisbal wrote: The more services you run on your cluster node (gmond, sendmail, etc.) the less performance is available for number crunching, but at the same time, administration difficulty increases. For example, if you turn off postfix/sendmail,

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread andrew holway
We've written our own lightweight monitoring daemon that reports back to a portal facility back on the master node. On Mon, Sep 22, 2008 at 7:32 PM, Prentice Bisbal <[EMAIL PROTECTED]> wrote: > The more services you run on your cluster node (gmond, sendmail, etc.) > the less performance is availab

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Ashley Pittman
On Mon, 2008-09-22 at 20:54 -0400, Perry E. Metzger wrote: > > By the way, if you really can't afford for things to "go away" for > 1/250th of a second very often, I have horrible news for you: NO > COMPUTER WILL WORK FOR YOU. To a large extent you are actually correct, this is one of the reasons

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Ashley Pittman
On Mon, 2008-09-22 at 15:44 -0400, Eric Thibodeau wrote: > Ashley Pittman wrote: > > On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote: > > If it were up to me I'd turn *everything* possible off except sshd and > > ntp. The problem however is the maintenance cost of doing this, it's > > fin

Re: [Beowulf] What services do you run on your cluster nodes?

2008-09-23 Thread Håkon Bugge
At 02:54 23.09.2008, Perry E. Metzger wrote: By the way, if you really can't afford for things to "go away" for 1/250th of a second very often, I have horrible news for you: NO COMPUTER WILL WORK FOR YOU. Obviously a statement from someone not on the top of the subject. Here, a statement from