Greg Lindahl wrote:
Do you have some papers that describe in what way the overall
performance of the application depends on differences in message rate etc?
No, but I can show you a series of slides with performance compared to
Mellanox, and a histogram of the actual message sizes and rates of
On Tue, Sep 19, 2006 at 01:33:34PM +0200, Toon Knapen wrote:
> >So my real applications that show message rate is important don't
> >count for some reason?
>
> Do you have some papers that describe in what way the overall
> performance of the application depends on differences in message rate et
Greg Lindahl wrote:
On Fri, Sep 15, 2006 at 02:08:17PM -0400, Patrick Geoffray wrote:
Interconnect people often lose track of this, and using micro-benchmarks
with no computation yields to some warped picture of the problems
(message rate).
So my real applications that show message rate is i
At 04:05 PM 9/17/2006, Mark Hahn wrote:
wherein Jim describes a system which scaled poorly even at n=2.
Yes, the software structure was badly designed for the interconnect.
HOWEVER, the whole point of computing resources is that I really
shouldn't have to design the whole software system arou
wherein Jim describes a system which scaled poorly even at n=2.
Yes, the software structure was badly designed for the interconnect.
HOWEVER, the whole point of computing resources is that I really
shouldn't have to design the whole software system around the
peculiarities of the hardware plat
At 09:12 AM 9/17/2006, Geoff Jacobs wrote:
Jim Lux wrote:
>
> Yes, the software structure was badly designed for the interconnect.
> HOWEVER, the whole point of computing resources is that I really
> shouldn't have to design the whole software system around the
> peculiarities of the hardware pl
Jim Lux wrote:
>
> Yes, the software structure was badly designed for the interconnect.
> HOWEVER, the whole point of computing resources is that I really
> shouldn't have to design the whole software system around the
> peculiarities of the hardware platform.
It sure would be nice if software
At 11:08 AM 9/15/2006, Patrick Geoffray wrote:
Toon Knapen wrote:
For instance, I wonder if any real-life application got a 50% boost
by just changing the switch (and the corresponding MPI
implementation). Or, what is exactly the speedup observed by
switching from switch A to switch B on a rea
27;; Mark Hahn
Subject: Re: [Beowulf] cluster softwares supporting parallel CFD computing
I agree that in general the quality of the parallelism in most codes is
rather low, unfortunately. But it is hard to proof that much can be gained
when the quality would be improved.
Let me elaborate. When d
On Fri, Sep 15, 2006 at 02:08:17PM -0400, Patrick Geoffray wrote:
> Interconnect people often lose track of this, and using micro-benchmarks
> with no computation yields to some warped picture of the problems
> (message rate).
So my real applications that show message rate is important don't
co
Toon Knapen wrote:
For instance, I wonder if any real-life application got a 50% boost by
just changing the switch (and the corresponding MPI implementation). Or,
what is exactly the speedup observed by switching from switch A to
switch B on a real-life application?
I could not agree more. We
Patrick Geoffray wrote:
> Alas, people use blocking calls in general because
> they are lazy (50%), they don't know (40%) or they don't care (10%).
We did some tests with non-blocking v.s. blocking. Unfortunately in our code
there is only a small window of overlap, i.e. almost immediately after o
I agree that in general the quality of the parallelism in most codes is
rather low, unfortunately. But it is hard to proof that much can be
gained when the quality would be improved.
Let me elaborate. When developing an app. that needs to run fast, one
first needs to look at using the best alg
Hi Mark,
Mark Hahn wrote:
all these points are accurate to some degree, but give a sad impression
of the typical MPI programmer. how many MPI programmers are professionals,
rather than profs or grad students just trying to finish a calculation?
I don't know, since I only see the academic side.
Patrick Geoffray wrote:
> Alas, people use blocking calls in general because
> they are lazy (50%), they don't know (40%) or they don't care (10%).
They're writing around an unpatched race condition in MPICH 1.2.2.
> There is also the chicken and egg problem: nobody really tried to
> overlap, so
> It does apply, however, many parallel algorithms used today are
> naturally blocking. Why? Well, complicating your algorithm to overlap
> communication and computation rarely gives a benefit in practice. So
> anyone who's tried has likely become discouraged, and most people
> haven't even tried
overlapped. Alas, people use blocking calls in general because they are lazy
(50%), they don't know (40%) or they don't care (10%). There is also the
chicken and egg problem: nobody really tried to overlap, so MPI
implementations didn't bother to support it, so you could not really overlap,
so
Hi Eric,
Eric W. Biederman wrote:
On the other hand it is my distinction impression the reason there is no
opportunity cost from polling is that the applications have not been
tuned as well as they could be. In all other domains of programming
synchronous receives are serious looked down upon.
Stuart Midgley <[EMAIL PROTECTED]> writes:
>>
>> It does apply, however, many parallel algorithms used today are
>> naturally blocking. Why? Well, complicating your algorithm to overlap
>> communication and computation rarely gives a benefit in practice. So
>> anyone who's tried has likely become
"Vincent Diepeveen" <[EMAIL PROTECTED]> writes:
> You're assuming that you run 1 thread at a 2 to 4 core node or so?
Not at all. I am assuming 1 thread per core, is typical.
So if you have 4 cores and 4 threads, when one of them is asleep.
Most likely when the interrupt arrives your previous c
Greg Lindahl <[EMAIL PROTECTED]> writes:
> On Thu, Sep 07, 2006 at 01:15:01PM -0600, Eric W. Biederman wrote:
>
>> I agree. Taking an interrupt per message is clearly a loss.
>
> Ah. So we're mostly in violent agreement!
That is always nice :)
>> Polling is a reasonable approach for the short d
ot;Daniel Kidger" <[EMAIL PROTECTED]>;
"'Mark Hahn'" <[EMAIL PROTECTED]>
Sent: Saturday, September 09, 2006 3:49 AM
Subject: Re: [Beowulf] cluster softwares supporting parallel CFD computing
"Vincent Diepeveen" <[EMAIL PROTECTED]> writes:
H
"Vincent Diepeveen" <[EMAIL PROTECTED]> writes:
> How about the latency to wake up that thread again. runqueue latency in linux
> is
> 10+ ms?
That assumes you have a 100Mhz clock (default is currently 250Mhz) and
you have something else running. If you have something else running
yielding is e
It does apply, however, many parallel algorithms used today are
naturally blocking. Why? Well, complicating your algorithm to overlap
communication and computation rarely gives a benefit in practice. So
anyone who's tried has likely become discouraged, and most people
haven't even tried.
-- g
On Thu, Sep 07, 2006 at 01:15:01PM -0600, Eric W. Biederman wrote:
> I agree. Taking an interrupt per message is clearly a loss.
Ah. So we're mostly in violent agreement!
> Polling is a reasonable approach for the short durations say
> <= 1 milisecond, but it is really weird to explain that yo
AIL PROTECTED]>; "'Mark Hahn'" <[EMAIL PROTECTED]>
Sent: Thursday, September 07, 2006 7:54 PM
Subject: Re: [Beowulf] cluster softwares supporting parallel CFD computing
Ashley Pittman <[EMAIL PROTECTED]> writes:
I think Daniel was talking about supercomputer
Ashley Pittman <[EMAIL PROTECTED]> writes:
> I think Daniel was talking about supercomputer networks and not
> ethernet, on the first QsNet2 machine I have to hand latency without
> interrupts is 2.72uSec, using interrupts it is 7.20uSec. One
> fundamental difference between these two measurement
Greg Lindahl <[EMAIL PROTECTED]> writes:
> On Wed, Sep 06, 2006 at 11:10:14AM -0600, Eric W. Biederman wrote:
>
>> There is fundamentally more work to do when you take an interrupt because
>> you need to take a context switch. But cost of a context switch is in
>> the order of microseconds, so wh
On Wed, 2006-09-06 at 11:10 -0600, Eric W. Biederman wrote:
> "Daniel Kidger" <[EMAIL PROTECTED]> writes:
>
> > Bogdan,
> >
> > Parallel applications with lots of MPI traffic should run fine on a cluster
> > with large jiffies - just as long as the interconnect you use doesn't need
> > to
> > tak
On Wed, Sep 06, 2006 at 11:10:14AM -0600, Eric W. Biederman wrote:
> There is fundamentally more work to do when you take an interrupt because
> you need to take a context switch. But cost of a context switch is in
> the order of microseconds, so while measurable taking an interrupt should
> not
"Daniel Kidger" <[EMAIL PROTECTED]> writes:
> Bogdan,
>
> Parallel applications with lots of MPI traffic should run fine on a cluster
> with large jiffies - just as long as the interconnect you use doesn't need to
> take any interrupts. (Interrupts add hugely to the latency figure)
I know a lot o
We, at LFC, have some specific requests, and our cluster is not large (much less powerfull than yours, a bunch of old PIII), so we use debian stable as OS, MPICH 1.2.7 as Message Passing Libraries, Portland Group Compilers and Intel Compilers, and netlib lapack and scalapack. We also use ACML with
ailto:[EMAIL PROTECTED] On
Behalf Of Bogdan Costescu
Sent: 06 September 2006 14:56
To: Mark Hahn
Cc: Beowulf List
Subject: Re: [Beowulf] cluster softwares supporting parallel CFD computing
On Mon, 4 Sep 2006, Mark Hahn wrote:
> 100 Hz scheduler ticks might make sense
Please don't me
On Mon, 4 Sep 2006, Mark Hahn wrote:
> 100 Hz scheduler ticks might make sense
Please don't mention this to beginners (which is the impression that
the original message left me) - if they care enough to search the
information available on the subject they will either make their own
mind or com
through 10/100/1000baseT Gigabit Ethernet. Initially we have decided
to setup cluster of 1 master and 3 compute nodes. Thus totaling 16
I think you should consider making the head node separate - in this case,
I'd go with fewer cores, more disk and perhaps less memory. you _can_
get away withou
In message from "amjad ali" <[EMAIL PROTECTED]> (Wed, 30 Aug 2006
20:28:37 +0500):
Hello, Hi All.
We are developing Beowulf Cluster for the first time at our
university
department. We will perform numerical simulations in CFD on the
cluster. We have chosen that each node (master and compute) of
Hello, Hi All.
We are developing Beowulf Cluster for the first time at our university
department. We will perform numerical simulations in CFD on the
cluster. We have chosen that each node (master and compute) of the
cluster will be consisting of 2 AMD Dual Core Opteron Processors on
Tyan Thunder
37 matches
Mail list logo