I guess I don't quite understand why you disagree Prentice. With the
exception that middleware doesn't strive to be a classification per se,
just a solution, it still consists of a "style of computing where you
sacrifice absolute high performance because of issues relating to any
combination o
Ashley Pittman wrote:
On Mon, 2008-09-22 at 15:44 -0400, Eric Thibodeau wrote:
Ashley Pittman wrote:
On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote:
If it were up to me I'd turn *everything* possible off except sshd and
ntp. The problem however is the maintenance cost
Lawrence Stewart wrote:
Is anyone aware of available test suites or API benchmark suites for
shmem? I am thinking of the equivalent of the Intel MPI tests or
Intel MPI Benchmarks, awful though they are.
I don't know any publicly available shmem validation or benchmark
suites. Not surprising s
I'm starting work on a shmem implementation for the SiCortex systems.
Is anyone aware of available test suites or API benchmark suites for
shmem? I am thinking of the equivalent of the Intel MPI tests or
Intel MPI Benchmarks, awful though they are.
Separately, does anyone here happen to know wh
Perry E. Metzger wrote:
You realize that most big HPC systems are using interconnects that
don't generate many or any interrupts, right?
Of course. Usually one even uses interrupt pacing/mitigation even in
gig ethernet on a modern machine -- otherwise you're not going to get
reasonable performa
On Sep 23, 2008, at 9:18 PM, Perry E. Metzger wrote:
Greg Lindahl <[EMAIL PROTECTED]> writes:
On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote:
As for the daemons, remember that with a proper scheduler, you will
switch straight from an incoming network interrupt to a high
p
Prentice Bisbal wrote:
Prentice Bisbal wrote:
The more services you run on your cluster node (gmond, sendmail, etc.)
the less performance is available for number crunching, but at the same
time, administration difficulty increases. For example, if you turn off
postfix/sendmail, you'll no longer
Prentice Bisbal wrote:
The more services you run on your cluster node (gmond, sendmail, etc.)
the less performance is available for number crunching, but at the same
time, administration difficulty increases. For example, if you turn off
postfix/sendmail, you'll no longer get automated e-mails fr
Greg Lindahl <[EMAIL PROTECTED]> writes:
> On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote:
>> As for the daemons, remember that with a proper scheduler, you will
>> switch straight from an incoming network interrupt to a high priority
>> process that is expecting the incoming pac
Vincent Diepeveen wrote:
I'd argue we might know this already as middleware.
That makes absolutely no sense.
Best regards from a hotel in Beijing,
Vincent
On Sep 23, 2008, at 10:32 PM, Jon Forrest wrote:
Given the recent discussion of whether running
multiple services and other such things
On Tue, 23 Sep 2008, Alan Ward wrote:
I have been reading the ongoing discussion on network usage with some
interest, mainly because in all (admittedly very small, 4 to 8 node)
clusters we have set up so far, we have always gone with doubling the
network. Nowadays we mostly run a 100 MBit/s "e
On Tue, Sep 23, 2008 at 07:43:19PM -0400, Perry E. Metzger wrote:
> As for the daemons, remember that with a proper scheduler, you will
> switch straight from an incoming network interrupt to a high priority
> process that is expecting the incoming packet, and that even works
> correctly on some (
Patrick Geoffray <[EMAIL PROTECTED]> writes:
> Perry E. Metzger wrote:
>> from processing interrupts, or prevent your OS from properly switching
>> to a high priority process following an interrupt, but SMM will and
>> you can't get rid of it.
>
> You can usually disable SMI, either through the BI
I'd argue we might know this already as middleware.
Best regards from a hotel in Beijing,
Vincent
On Sep 23, 2008, at 10:32 PM, Jon Forrest wrote:
Given the recent discussion of whether running
multiple services and other such things affects
the running of a cluster, I'd like to propose
a new
Prentice Bisbal wrote:
Alan Ward wrote:
Good day.
I have been reading the ongoing discussion on network usage with some
interest, mainly because in all (admittedly very small, 4 to 8 node)
clusters we have set up so far, we have always gone with doubling the
network. Nowadays we mostly run a
--- On Tue, 9/23/08, Greg Lindahl <[EMAIL PROTECTED]> wrote:
> From: Greg Lindahl <[EMAIL PROTECTED]>
> Subject: Re: [Beowulf] One network, or two?
> To: beowulf@beowulf.org
> Date: Tuesday, September 23, 2008, 1:20 PM
> On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua
> Baker-LePain wrote:
>
> >
On Tue, 23 Sep 2008 at 1:20pm, Greg Lindahl wrote
On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua Baker-LePain wrote:
In theory, it's a great idea. In practice, wiring up an entire 2nd
network gets to be a major PITA (as well as being uncheap) as your
cluster size increases. I've got ~350 n
Perry E. Metzger wrote:
from processing interrupts, or prevent your OS from properly switching
to a high priority process following an interrupt, but SMM will and
you can't get rid of it.
You can usually disable SMI, either through the BIOS or directly from
the chipset. However, you will lose
Given the recent discussion of whether running
multiple services and other such things affects
the running of a cluster, I'd like to propose
a new classification of computing.
I call this Pretty High Performance Computing (PHPC).
This is a style of computing where you sacrifice
absolute high perf
Prentice Bisbal wrote:
[...]
My new cluster, which is still in labor, will have InfiniBand for MPI,
and we have 10 Gb ethernet switches for management/NFS, etc. The nodes
only have 1 Gb ethernet, so it will be effectively a 1 Gb network.
I'm also curious as to whether the dual networks are overk
On Tue, Sep 23, 2008 at 03:21:55PM -0400, Joshua Baker-LePain wrote:
> In theory, it's a great idea. In practice, wiring up an entire 2nd
> network gets to be a major PITA (as well as being uncheap) as your
> cluster size increases. I've got ~350 nodes here, and a 2nd set of
> cabling is rea
Oops. e-mailed to the wrong address. The cat's out of the bag now! No
big deal. I was 50/50 about CC-ing the list, anyway. Just remove the
phrase "off-list" in the first sentence, and that last bit about not
posting to the list because...
Great. I'll never get a job that requires security clearan
Gerry,
I wanted to let you know off-list that I'm going through the same
problems right now. I thought you'd like to know you're not alone. We
purchased a cluster from the *allegedly* same vendor. The PXE boot and
keyboard errors were the least of our problems.
First, our cluster was delayed 2 m
Alan Ward wrote:
>
> Good day.
>
> I have been reading the ongoing discussion on network usage with some
> interest, mainly because in all (admittedly very small, 4 to 8 node)
> clusters we have set up so far, we have always gone with doubling the
> network. Nowadays we mostly run a 100 MBit/s "e
On Tue, 23 Sep 2008 at 9:16pm, Alan Ward wrote
I have been reading the ongoing discussion on network usage with some
interest, mainly because in all (admittedly very small, 4 to 8 node)
clusters we have set up so far, we have always gone with doubling the
network. Nowadays we mostly run a 100
Good day.
I have been reading the ongoing discussion on network usage with some interest,
mainly because in all (admittedly very small, 4 to 8 node) clusters we have set
up so far, we have always gone with doubling the network. Nowadays we mostly
run a 100 MBit/s "el cheapo" FastEthernet for c
>
> XML is evil. Well, evil for this.
>
> Ganglia does consume a significant portion of resources.
> I've heard first-hand reports of 20% CPU. (Admittedly before
> they figured out what was happening and turned the reporting
> list and frequency way down.)
>
> When we found that we needed to sup
Donald Becker wrote:
On Mon, 22 Sep 2008, Perry E. Metzger wrote:
Prentice Bisbal <[EMAIL PROTECTED]> writes:
The more services you run on your cluster node (gmond, sendmail, etc.)
the less performance is available for number crunching, but at the same
time, administration difficulty i
Hi Matt:
On Mon, Sep 22, 2008 at 9:26 PM, Matt Lawrence <[EMAIL PROTECTED]> wrote:
> Well, the folks I talked to at TACC were not enthusiastic about the amount
> of resources ganglia uses. I will agree that there is a lot of unecessary
> stuff that goes on, like converting everything to and from
Hi Joe:
On Tue, Sep 23, 2008 at 4:50 AM, Joe Landman
<[EMAIL PROTECTED]> wrote:
> some overhead. And occasionally gmond wanders off into a different universe
> ...
Care to elaborate what this means? Perhaps I can help :-)
Cheers,
Bernard
___
Beowul
Hi John:
On Tue, Sep 23, 2008 at 4:54 AM, John Hearns <[EMAIL PROTECTED]> wrote:
> That's a reason why I'm no great lover of Ganglia too - it just sprays
> multicast packets all over your network.
> Which really should be OK - but if you have switches which don't perform
> well with multicast you
On Mon, 22 Sep 2008, Perry E. Metzger wrote:
>
> Prentice Bisbal <[EMAIL PROTECTED]> writes:
> > The more services you run on your cluster node (gmond, sendmail, etc.)
> > the less performance is available for number crunching, but at the same
> > time, administration difficulty increases. For ex
Has anyone tried -- or, better, succeeded with -- SCore MPI on quad core
systems, specifically Barcelonas? It worked fine before they were
upgraded from 2×2 Opteron cores to 2×4, but now it's performing much
worse than other MPIs (with gig ethernet). I'm not sure exactly what
the measurements inv
On Tue, 23 Sep 2008, Perry E. Metzger wrote:
"Robert G. Brown" <[EMAIL PROTECTED]> writes:
You can run xmlsysd as either an xinetd process or forking daemon
(the former is more secure, perhaps, the latter makes it stand alone
and keeps one from having to run xinetd:-).
Arguably, running proc
On Tue, 23 Sep 2008, Robert G. Brown wrote:
> On Mon, 22 Sep 2008, Matt Lawrence wrote:
> > On Mon, 22 Sep 2008, Bernard Li wrote:
> >
> >> Ganglia collects metrics from hosts and trends them for the user.
> >> Most of these metrics need to be collected from the host itself (CPU,
> >> memory, load,
On Tue, 23 Sep 2008, John Hearns wrote:
2008/9/23 Robert G. Brown <[EMAIL PROTECTED]>
This meant that there could be hundreds or even thousands of machines
that saw every packet produced by every other machine on the LAN,
possibly after a few ethernet bridge hops. This made conditions ripe
f
I was asked by my employer to publish this a bit ago, so here it is:
http://stromberg.dnsalias.org/~dstromberg/looper/
It's a multithreaded script for running n POSIX shell commands m at a
time with good error checking. It allows for things like stashing in ssh
$hostname or rsync $hostname in
On Sep 17, 2008, at 11:39 AM, Lux, James P wrote:
A big disadvantage of in-house is that it leads to an inventory of
ancient gear that becomes hard to maintain, and balkanized ownership
(we bought that for Project X, and though Project X is long gone,
the former staff of Project X still
This is a bit of a shameless plug but as were on the subject.
We are just in the process of releasing our new stack, ClusterVisionOS
v4. We stick it on top of your favorite red hat clone(mostly SL) to
provide a host of handy cluster tools and enhancements. From
November/December(ish) you will get
"Robert G. Brown" <[EMAIL PROTECTED]> writes:
> You can run xmlsysd as either an xinetd process or forking daemon
> (the former is more secure, perhaps, the latter makes it stand alone
> and keeps one from having to run xinetd:-).
Arguably, running processes under inetd can make them more secure,
On Tue, 23 Sep 2008, Joe Landman wrote:
Robert G. Brown wrote:
One can always run xmlsysd instead, which is a very lightweight
on-demand information service. It costs you, basically, a socket, and
you can poll the nodes to get their current runstate every five seconds,
every thirty seconds, e
Ashley Pittman <[EMAIL PROTECTED]> writes:
> Note that it's not just the "OS" fluff which causes problems and turning
> things off doesn't get you anything like 100% of the way there, some
> deamons have to run (your job scheduler) so all you can do it tune them
> or re-code them to use less CPU a
Gerry Creager wrote:
> Eric Thibodeau wrote:
>> Prentice Bisbal wrote:
>>> The more services you run on your cluster node (gmond, sendmail, etc.)
>>> the less performance is available for number crunching, but at the same
>>> time, administration difficulty increases. For example, if you turn off
John Hearns wrote:
That's a reason why I'm no great lover of Ganglia too - it just sprays
multicast packets all over your network.
Yeah ... try to use a network in the middle of a multicast storm. As I
remember, every machine seeing a multicast packet has to at least
inspect the packet to s
On Mon, 2008-09-22 at 15:52 -0700, Bernard Li wrote:
> Or you can re-post with a different topic here, and I'll try my best
> to answer them :-)
We collect a half dozen or so simple metrics per node and add them to
Ganglia using "gmetric". A naive implementation of this with 32 nodes
each reporti
2008/9/23 Robert G. Brown <[EMAIL PROTECTED]>
>
> This meant that there could be hundreds or even thousands of machines
> that saw every packet produced by every other machine on the LAN,
> possibly after a few ethernet bridge hops. This made conditions ripe
> for what used to be called a "packet
Robert G. Brown wrote:
One can always run xmlsysd instead, which is a very lightweight
on-demand information service. It costs you, basically, a socket, and
you can poll the nodes to get their current runstate every five seconds,
every thirty seconds, every minute, every five minutes. Pick a
g
Greg Lindahl <[EMAIL PROTECTED]> writes:
>> By the way, if you really can't afford for things to "go away" for
>> 1/250th of a second very often, I have horrible news for you: NO
>> COMPUTER WILL WORK FOR YOU.
>
> You haven't done much HPC, have you? Why do you think we build
> interconnects with
On Tue, 23 Sep 2008, Ashley Pittman wrote:
On Mon, 2008-09-22 at 20:54 -0400, Perry E. Metzger wrote:
By the way, if you really can't afford for things to "go away" for
1/250th of a second very often, I have horrible news for you: NO
COMPUTER WILL WORK FOR YOU.
To a large extent you are actu
On Mon, 22 Sep 2008, Matt Lawrence wrote:
On Mon, 22 Sep 2008, Bernard Li wrote:
Ganglia collects metrics from hosts and trends them for the user.
Most of these metrics need to be collected from the host itself (CPU,
memory, load, etc.).
Besides, the footprint of Ganglia is very little. I ha
On Mon, 22 Sep 2008, Joe Landman wrote:
Prentice Bisbal wrote:
The more services you run on your cluster node (gmond, sendmail, etc.)
the less performance is available for number crunching, but at the same
time, administration difficulty increases. For example, if you turn off
postfix/sendmail,
We've written our own lightweight monitoring daemon that reports back
to a portal facility back on the master node.
On Mon, Sep 22, 2008 at 7:32 PM, Prentice Bisbal <[EMAIL PROTECTED]> wrote:
> The more services you run on your cluster node (gmond, sendmail, etc.)
> the less performance is availab
On Mon, 2008-09-22 at 20:54 -0400, Perry E. Metzger wrote:
>
> By the way, if you really can't afford for things to "go away" for
> 1/250th of a second very often, I have horrible news for you: NO
> COMPUTER WILL WORK FOR YOU.
To a large extent you are actually correct, this is one of the reasons
On Mon, 2008-09-22 at 15:44 -0400, Eric Thibodeau wrote:
> Ashley Pittman wrote:
> > On Mon, 2008-09-22 at 14:56 -0400, Eric Thibodeau wrote:
> > If it were up to me I'd turn *everything* possible off except sshd and
> > ntp. The problem however is the maintenance cost of doing this, it's
> > fin
At 02:54 23.09.2008, Perry E. Metzger wrote:
By the way, if you really can't afford for things to "go away" for
1/250th of a second very often, I have horrible news for you: NO
COMPUTER WILL WORK FOR YOU.
Obviously a statement from someone not on the top
of the subject. Here, a statement from
55 matches
Mail list logo