Re: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors?

2007-03-30 Thread Bill Broadley
Jon Forrest wrote: I've been pulling out what little hair I have left while trying to figure out a bizarre problem with a Linux cluster I'm running. Here's a short description of the problem. I'm managing a 29-node cluster. All the nodes use the same hardware and boot the same kernel image (Sci

Re: [Beowulf] Performance characterising a HPC application

2007-03-30 Thread Scott Atchley
On Mar 26, 2007, at 1:04 PM, Gilad Shainer wrote: When Mellanox refers to transport offload, it mean full transport offload - for all transport semantics. InfiniBand, as you probably know, provides RDMA AND Send/Receive semantics, and in both cases you can do Zero-copy operations. This full fle

Re: [Beowulf] Performance characterising a HPC application

2007-03-30 Thread Greg Lindahl
On Mon, Mar 26, 2007 at 10:04:13AM -0700, Gilad Shainer wrote: > This full flexibility provides the programmer with the ability to > choose the best semantics for his use. Some programmers choose > Send/Receive and some RDMA. It is all depends on their application. HPC customers want a fast MPI.

Re: [Beowulf] Input Sought: "Basic" Luster FS deployment on GigEther-Fabric Cluster

2007-03-30 Thread Greg Lindahl
On Wed, Mar 28, 2007 at 03:30:30PM -0400, Tim Chipman wrote: > Some digging in this list archive suggested a bit of debate (ie, > Lustre performance would only exceed NFS if lots of large streaming > intensive I/O access, otherwise it would be worse). How many MB/s of I/O does your app do? This i

RE: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors?

2007-03-30 Thread Steve Phillips \(stevep\)
One thing to check is that the switch and NIC are negotiating duplex correctly... Duplex mis-negotiation (ie switch full, NIC half) used to be a fairly common cause of FCS errors, although this is rare now as drivers have gotten a lot better. What will happen is the Full duplex station will transmi

[Beowulf] OT? GPU accelerators for finite difference time domain

2007-03-30 Thread Peter Wainwright
Slightly off-topic for this list, but I can't think of a more likely forum to find information on HPC topics. A couple of colleagues just returned from the "Progress in Electromagnetics Research Symposium" in Verona. There appears to be a considerable buzz now around FDTD calculations on GPUs. A

[Beowulf] Input Sought: "Basic" Luster FS deployment on GigEther-Fabric Cluster

2007-03-30 Thread Tim Chipman
Hi all, I've trawled the archives and read via google for a few days now, but have not got a lot of clarity yet - hence a query to the list. If merited/of use I can summarize back replies once done. I'm looking to soon begin deployment of a ~50node (dual socket, dual core opteron) cluster wi

[Beowulf] Re: Performance characterising a HPC application

2007-03-30 Thread stephen mulcahy
[resend - I think my first attempt was canned due to being too large, I've stripped it down to PingPong, Bcast and Reduce] Hi, As a follow on to my previous mail, I've gone ahead and run the Intel MPI Benchmarks (v3.0) on this cluster and gotten the following results - I'd be curious to know how

[Beowulf] How to Diagnose Cause of Cluster Ethernet Errors?

2007-03-30 Thread Jon Forrest
I've been pulling out what little hair I have left while trying to figure out a bizarre problem with a Linux cluster I'm running. Here's a short description of the problem. I'm managing a 29-node cluster. All the nodes use the same hardware and boot the same kernel image (Scientific Linux 4.4, l

[Beowulf] Re: Beowulf Digest, Vol 37, Issue 58

2007-03-30 Thread Håkon Bugge
Hi again Christian, At 16:59 26.03.2007, Christian Bell wrote: Hi Håkon, I'm unsure if i would call significant a submission comparing results between configurations not compared at scale (in appearance large versus small switch, much heavier shared-memory component at small process count

[Beowulf] Re: Performance characterising a HPC application

2007-03-30 Thread stephen mulcahy
Hi, As a follow on to my previous mail, I've gone ahead and run the Intel MPI Benchmarks (v3.0) on this cluster and gotten the following results - I'd be curious to know how they compare to other similar clusters. Also, I'm trying to determine which parts of the IMB results are most importan

RE: [Beowulf] Performance characterising a HPC application

2007-03-30 Thread Gilad Shainer
> Offload, usually implemented by RDMA offload, or the ability > for a NIC to autonomously send and/or receive data from/to > memory is certainly a nice feature to tout. If one considers > RDMA at an interface level (without looking at the > registration calls required on some interconnects)

[Beowulf] Server room design consulting

2007-03-30 Thread Daniel Majchrzak
We have a dedicated cluster room, email server room, and networking room that have slowly evolved over the years. Due to budget constraints in the past no one has ever done an analysis of our electricity and AC. (We've had the facilities people in, but their analysis wasn't any better than our own

RE: [Beowulf] Performance characterising a HPC application

2007-03-30 Thread Gilad Shainer
> obviously, there are > many applications which have absolutely no use for bandwidth > greater than even plain old gigabit. > equally obvious, there are others which are sensitive to > small-packet latency, which is not affected by DDR or dual-rail. Yes, there are application that don't uti

RE: [Beowulf] Performance characterising a HPC application

2007-03-30 Thread Gilad Shainer
> > The next slide shows a graph of the LS-Dyna results recently > submitted to topcrunch.org, showing that InfiniPath SDR beats > Mellanox DDR on the neon_refined_revised problem, both > running on 3.0 Ghz Woodcrest dual/dual nodes. This is yet another example of "fair" comparison. Unlike Q