Re: [Beowulf] Re: RAM ECC errors (Henning Fehrmann)

2010-02-23 Thread Henning Fehrmann
Hi Mark, On Tue, Feb 23, 2010 at 03:05:39PM -0500, Mark Hahn wrote: > >No, but there seem to be a switch in the kernel module that allows to trigger > >a kernel panic upon discovering uncorrectable errors. > > I suspect you mean /sys/module/edac_mc/panic_on_ue > (ue = uncorrected error). I cons

Re: [Beowulf] Re: RAM ECC errors

2010-02-23 Thread Henning Fehrmann
Hi David, Thank you for the response. > Carsten Aulbert wrote > > > Are you saying that now that you are monitoring you are seeing kernel > > > panics which did not appear before? > > > > > > > No, but there seem to be a switch in the kernel module that allows to > trigger > > a kernel panic

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Sangamesh B
Hi, I hope you are developing MPI codes and wants to run in cluster environment. If so, I prefer you to use Open MPI. Because, Open MPI is well developed and its stable Has a very good FAQ section, where you will get clear your doubts easily. It has a in-built tight-integration method wit

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Rahul Nabar
On Tue, Feb 23, 2010 at 11:03 PM, Eric W. Biederman wrote: > > For the most trivial of loops there is link aggregation. Yup, that's true. Am already using link aggregation but never thought of that as a loop before. But makes sense. > For more interesting loops you can run many ethernet switche

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Eric W. Biederman
Greg Lindahl writes: > On Tue, Feb 23, 2010 at 03:15:28PM -0600, Rahul Nabar wrote: >> On Tue, Feb 23, 2010 at 3:10 PM, Greg Lindahl wrote: >> > On Tue, Feb 23, 2010 at 01:23:59PM -0600, Rahul Nabar wrote: >> > >> >> In the interest of latency minimum switch hops make sense and for that >> >> lo

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Greg Lindahl
On Tue, Feb 23, 2010 at 06:23:59PM -0500, Brian Dobbins wrote: > Actually, it's often *for* performance that we look towards hybrid > methods, albeit in an indirect way - with RAM amounts per node increasing at > the same or lesser rate than cores, and with each MPI task on *some* of our > codes

[Beowulf] Arima motherboards with SATA2 drives

2010-02-23 Thread David Mathog
Have any of you seen a patched BIOS for the Arima HDAM* motherboards that resolves the issue of the Sil 3114 SATA controller locking up when it sees a SATA II disk? (Even a disk jumpered to Sata I speeds.) Silicon Image released a BIOS fix for this, but since all of these motherboards use a Phoeni

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Brian Dobbins
Hi Greg, > Well, clearly we hope to move more towards hybrid methods -all that's > old > > is new again?- > > If you want bad performance, sure. If you want good performance, you > want a device which supports talking to a lot of cores, and then > multiple devices per node, before you go hybrid.

RE: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Gilad Shainer
> The benchmark that we created is not a coalescing benchmark. > > Coalescing produces a meaningless answer from the message rate > benchmark. Real apps don't get much of a benefit from message > coalescing, but (if they send smallish messages) they get a big > benefit from a good non-coalesced me

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Greg Lindahl
On Tue, Feb 23, 2010 at 05:35:41PM -0500, Brian Dobbins wrote: > Well, clearly we hope to move more towards hybrid methods -all that's old > is new again?- If you want bad performance, sure. If you want good performance, you want a device which supports talking to a lot of cores, and then multi

RE: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Gilad Shainer
PCIe Gen2 at 5GT, and 8/10 bit encoding and current chipsets efficiencies gives you around 3.3GB/s per direction, so one IB QDR port can handle that. For more BW out of the host, you can use more adapters (single port ones are the cost effective solution for that). Gilad From: beowulf-boun

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Brian Dobbins
> I thought PCIe Gen2 x 8 @ 500 Mhz gives 8GB/s? I know there are 250 and > 500 Mhz variants in addition to the lane sizes, so while a 250 Mhz x8 link > wouldn't provide enough bandwidth to a dual-port card, the 500 Mhz one > should. But I'm woefully out of date on my hardware knowledge, it see

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Brian Dobbins
Hi Patrick, I have been quite vocal in the past against the merit of high packet rate, > but I have learned to appreciate it. There is a set of applications that can > benefit from it, especially at scale. Actually, packet rate is much more > important outside of HPC (where application throughput

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Greg Lindahl
On Tue, Feb 23, 2010 at 04:57:23PM -0500, Mark Hahn wrote: > in the interests of less personal/posturing/pissing, let me ask: > where does the win from coalescing come from? I would have thought > that coalescing is mainly a way to reduce interrupts, a technique > that's familiar from ethernet in

RE: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Gilad Shainer
>On a similar note, does a dual-port card provide an increase in > on-card processing, or 'just' another link? (The increased bandwidth is > certainly nice, even in a flat switched network, I'm sure!) Today one port IB (assuming QDR) can saturate the PCIe Gen2 interface that is supported. Us

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Patrick Geoffray
Brian, On 2/19/2010 1:25 PM, Brian Dobbins wrote: the IB cards. With a 4-socket node having between 32 and 48 cores, lots of computing can get done fast, possibly stressing the network. I know Qlogic has made a big deal about the InfiniPath adapter's extremely good message rate in the past.

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Mark Hahn
Coalescing produces a meaningless answer from the message rate benchmark. Real apps don't get much of a benefit from message coalescing, but (if they send smallish messages) they get a big benefit from a good non-coalesced message rate. in the interests of less personal/posturing/pissing, let me

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Greg Lindahl
On Tue, Feb 23, 2010 at 03:15:28PM -0600, Rahul Nabar wrote: > On Tue, Feb 23, 2010 at 3:10 PM, Greg Lindahl wrote: > > On Tue, Feb 23, 2010 at 01:23:59PM -0600, Rahul Nabar wrote: > > > >> In the interest of latency minimum switch hops make sense and for that > >> loops might sometimes provide th

Re: [Beowulf] Q: IB message rate & large core counts (per node)?

2010-02-23 Thread Greg Lindahl
On Fri, Feb 19, 2010 at 02:36:34PM -0800, Gilad Shainer wrote: > Nice to hear from you Greg, hope all is well. I hope all is well with you, Gilad. From what I can tell, you're again visiting that alternate Universe that you sometimes visit -- is it nice there? > I don't forget anything, at least

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Rahul Nabar
On Tue, Feb 23, 2010 at 3:10 PM, Greg Lindahl wrote: > On Tue, Feb 23, 2010 at 01:23:59PM -0600, Rahul Nabar wrote: > >> In the interest of latency minimum switch hops make sense and for that >> loops might sometimes provide the best solution. > > STP disables all loops. All you gain is a bit of r

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Greg Lindahl
On Tue, Feb 23, 2010 at 01:23:59PM -0600, Rahul Nabar wrote: > In the interest of latency minimum switch hops make sense and for that > loops might sometimes provide the best solution. STP disables all loops. All you gain is a bit of redundancy, but the price is high. -- greg __

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Rahul Nabar
On Tue, Feb 23, 2010 at 2:02 PM, Gerry Creager wrote: > > It's my firm opinion that loops and STP are evil for HPC installations. > Period. Thanks Gerry! This seems like one of the rare HPC-topics where such a clear answer is present! :) "It depends" is more usual for me to hear. I bet you have

Re: [Beowulf] Re: RAM ECC errors (Henning Fehrmann)

2010-02-23 Thread Mark Hahn
No, but there seem to be a switch in the kernel module that allows to trigger a kernel panic upon discovering uncorrectable errors. I suspect you mean /sys/module/edac_mc/panic_on_ue (ue = uncorrected error). I consider this very much the norm: it would be very strange to run with ECC memory, a

Re: [Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Gerry Creager
On 2/23/10 1:23 PM, Rahul Nabar wrote: Over the years I have scrupulously adhered to the conventional wisdom that "spanning tree" is turned off on HPC switches. So that protocols don't time out in the time STP needs to acquire its model of network topology. But that does assume that there are no

[Beowulf] Spanning Tree Protocol and latency: allowing loops in switching networks for minimizing switch hops

2010-02-23 Thread Rahul Nabar
Over the years I have scrupulously adhered to the conventional wisdom that "spanning tree" is turned off on HPC switches. So that protocols don't time out in the time STP needs to acquire its model of network topology. But that does assume that there are no loops in the switch connectivity that can

[Beowulf] Re: RAM ECC errors

2010-02-23 Thread David Mathog
Carsten Aulbert wrote > > Are you saying that now that you are monitoring you are seeing kernel > > panics which did not appear before? > > > > No, but there seem to be a switch in the kernel module that allows to trigger > a kernel panic upon discovering uncorrectable errors. By "switch" do y

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Douglas Guptill
On Tue, Feb 23, 2010 at 09:25:45AM -0500, Brock Palen wrote: > (shameless plug) if you want, listen to our podcast on OpenMPI > http://www.rce-cast.com/index.php/Podcast/rce01-openmpi.html > > The MPICH2 show is recorded (edited it last night, almost done!), and > will be released this Saturday

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Brock Palen
why do you think it would make any difference? it's also normally pretty trivial to switch. Very true, we use modules and can swap easily, and rebuild What are the reasons to prefer one or the other? none - it's a matter of taste, especially since your application will not be sensitive

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Scott Atchley
On Feb 20, 2010, at 1:49 PM, Paul Johnson wrote: What are the reasons to prefer one or the other? Why choose? You can install both and test with your application to see if there is a performance difference (be sure to keep your runtime environment paths correct - don't mix libraries and MP

Re: [Beowulf] which mpi library should I focus on?

2010-02-23 Thread Mark Hahn
i've not written MPI programs before. I've written plenty of C and Java, however, and I think I can learn. I'm trying to decide whether to concentrate on OpenMPI or MPICH2 as I get started. In the Internet, I find plenty of people who are fiercely devoted to MPICH2, and I also find plenty of pe

[Beowulf] which mpi library should I focus on?

2010-02-23 Thread Paul Johnson
i've not written MPI programs before. I've written plenty of C and Java, however, and I think I can learn. I'm trying to decide whether to concentrate on OpenMPI or MPICH2 as I get started. In the Internet, I find plenty of people who are fiercely devoted to MPICH2, and I also find plenty of peo