Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread John Hearns
On Sun, 2008-05-11 at 16:01 -0400, Perry E. Metzger wrote: > Who do you favor for console servers these days? Ditto for > addressable/switchable PDUs? > I hope Joe doesn't mind me answering a question directed at him, but for us if you spec a separate console server it would be a cyclades Alterpa

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Mark Hahn
In the amd case all socket F cpu's support ecc. Does support for ECC on AMD 64 bit processors bypass BIOS support, for there _is_ no memory without bios support - it's pretty interesting to read the AMD bios writer's guide, and see how the bios has to bootstrap: detecting kinds of memory, me

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Jim Lux
Quoting Joe Landman <[EMAIL PROTECTED]>, on Mon 12 May 2008 06:03:56 PM PDT: Perry E. Metzger wrote: Joe Landman <[EMAIL PROTECTED]> writes: I've been reading spec sheets, and they often don't tell you, which is rather annoying. Thus my question. I just randomly selected 2 motherboards from

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Geoff Jacobs
Joel Jaeggli wrote: > if you're operating at the component level it's a question which core > logic chipsets are in use. > > In the amd case all socket F cpu's support ecc. Does support for ECC on AMD 64 bit processors bypass BIOS support, for the most part? Let me rephrase: if I install an Opter

Re: [Beowulf] many cores and ib

2008-05-12 Thread Patrick Geoffray
Gilad, Gilad Shainer wrote: My apologizes. I meant the MPI includes an option to collect several MPI messages into one network message. For applications cases, sometimes it helps with performance and sometimes it does not. OSU have shown both cases, and every user can decide what works best for

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Perry E. Metzger
Joe Landman <[EMAIL PROTECTED]> writes: >> I've been reading spec sheets, and they often don't tell you, which is >> rather annoying. Thus my question. > > I just randomly selected 2 motherboards from 2 different vendors and > on both spec sheets, they clearly defined which memory they took. Oh,

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Perry E. Metzger
Joe Landman <[EMAIL PROTECTED]> writes: > Server class boards generally do ECC. Desktop class generally do not. > > Spec sheets are your friends. I've been reading spec sheets, and they often don't tell you, which is rather annoying. Thus my question. Perry _

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Joel Jaeggli
Perry E. Metzger wrote: Given that one is doing very long computations, it seems obvious to me that you're likely to get memory errors that alter your computations if you only run long enough. ECC memory thus seems like a near necessity these days. However, I've been finding it increasingly diff

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Joe Landman
Perry E. Metzger wrote: Joe Landman <[EMAIL PROTECTED]> writes: I've been reading spec sheets, and they often don't tell you, which is rather annoying. Thus my question. I just randomly selected 2 motherboards from 2 different vendors and on both spec sheets, they clearly defined which memory t

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Joe Landman
Greg Lindahl wrote: On Mon, May 12, 2008 at 07:39:35PM -0400, Joe Landman wrote: Server class boards generally do ECC. Desktop class generally do not. My local gamer PC shop was easily able to find desktop mobos with ECC for me; they have a summary of the spec sheets from somewhere. They'd n

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Joe Landman
Perry E. Metzger wrote: Joe Landman <[EMAIL PROTECTED]> writes: Server class boards generally do ECC. Desktop class generally do not. Spec sheets are your friends. I've been reading spec sheets, and they often don't tell you, which is rather annoying. Thus my question. I just randomly sele

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Bill Broadley
Perry E. Metzger wrote: However, I've been finding it increasingly difficult to figure out, based on manufacturers blurbs, whether or not given motherboards support ECC memory properly. Anyone have any tips on how to determine it easily? The motherboard manual usually refers to a document that

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Greg Lindahl
On Mon, May 12, 2008 at 07:39:35PM -0400, Joe Landman wrote: > Server class boards generally do ECC. Desktop class generally do not. My local gamer PC shop was easily able to find desktop mobos with ECC for me; they have a summary of the spec sheets from somewhere. They'd never actually sold one

Re: [Beowulf] ECC support on motherboards?

2008-05-12 Thread Joe Landman
Perry E. Metzger wrote: Given that one is doing very long computations, it seems obvious to me that you're likely to get memory errors that alter your computations if you only run long enough. ECC memory thus seems like a near necessity these days. However, I've been finding it increasingly diff

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Joe Landman
Perry E. Metzger wrote: Mark Hahn <[EMAIL PROTECTED]> writes: I find that with IPMI and console redirection, it's very rarely necessary to care about where your nodes are, at least from a sysadmin perspective. Speaking of IPMI, are there reasonable motherboards that incorporate it right into t

[Beowulf] ECC support on motherboards?

2008-05-12 Thread Perry E. Metzger
Given that one is doing very long computations, it seems obvious to me that you're likely to get memory errors that alter your computations if you only run long enough. ECC memory thus seems like a near necessity these days. However, I've been finding it increasingly difficult to figure out, base

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread bchapple
At Edward Waters College in Jacksonville Fl., we went to Purdue to see their beowulf cluster and we have a cluster setup now. Bernard Chapple CIO Edward Waters College Sent via BlackBerry from T-Mobile -Original Message- From: Karen Shaeffer <[EMAIL PROTECTED]> Date: Sun, 11 May 2008 13

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Perry E. Metzger
Alan Louis Scheinine <[EMAIL PROTECTED]> writes: [Describing machines that had IPMI installed...] > We installed 1U cases in a rack and plugged-in the machines > but they were turned-off. Were the power supplies physically turned off or were these machines "soft off"? Modern machines have both sw

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Anand Vaidya
On Monday 12 May 2008 07:28:15 Karen Shaeffer wrote: > On Sun, May 11, 2008 at 04:45:46PM -0400, Joe Landman wrote: > > Karen Shaeffer wrote: > > >Hi Joel, > > >Yes, but a separate and distinct power distribution for that +5 volt > > >supply will then need to be implemented on the system board to s

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Joel Jaeggli
Karen Shaeffer wrote: On Sun, May 11, 2008 at 12:42:18PM -0700, Joel Jaeggli wrote: Karen Shaeffer wrote: Hi, To implement full IPMI capability on the system board, then you need two separate and independent power distribution systems on the system board I believe, which is why they implement I

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Perry E. Metzger
Joe Landman <[EMAIL PROTECTED]> writes: > We have found as a safety precaution, that including a console > server/kvm unit and having power control via addressable/switchable > PDU is a great backup, especially when we are hundreds of km (or > simply different timezones) from the units. Who do yo

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Joel Jaeggli
Karen Shaeffer wrote: On Sun, May 11, 2008 at 02:45:05AM -0400, Mark Hahn wrote: I find that with IPMI and console redirection, it's very rarely necessary to care about where your nodes are, at least from a sysadmin perspective. Speaking of IPMI, are there reasonable motherboards that incorpor

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Perry E. Metzger
Mark Hahn <[EMAIL PROTECTED]> writes: > I find that with IPMI and console redirection, it's very rarely necessary to > care about where your nodes are, at least from a sysadmin perspective. Speaking of IPMI, are there reasonable motherboards that incorporate it right into the design at this point

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Karen Shaeffer
On Mon, May 12, 2008 at 08:20:21AM -0500, Geoff Jacobs wrote: > > It's a separate feed from the ATX supply, and I can't see how anyone > would be stupid enough to switch it on the motherboard. It should be > simply railed around the motherboard wherever needed. Perhaps, just to > be sure, I should

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Glen Beane
On May 12, 2008, at 1:11 PM, Reuti wrote: Am 12.05.2008 um 18:01 schrieb Craig Tierney: Reuti wrote: Hiho, Am 12.05.2008 um 15:14 schrieb Prentice Bisbal: It's still an RFE in SGE to get any arbitrary combination of resources, e.g. you need for one job 1 host with big I/O, 2 with huge me

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Reuti
Am 12.05.2008 um 18:01 schrieb Craig Tierney: Reuti wrote: Hiho, Am 12.05.2008 um 15:14 schrieb Prentice Bisbal: It's still an RFE in SGE to get any arbitrary combination of resources, e.g. you need for one job 1 host with big I/O, 2 with huge memory and 3 "standard" type of nodes you could

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Craig Tierney
Reuti wrote: Hiho, Am 12.05.2008 um 15:14 schrieb Prentice Bisbal: At a previous job, I installed SGE for our cluster. At my current job Torque is the queuing system of choice. I'm very familar with SGE, but only have a cursory knowledge of Torque (installed it for evaluation, and that's it).

[Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread David Mathog
Robert G. Brown wrote: > On Sun, 11 May 2008, Joe Landman wrote: > > > Gerry Creager wrote: > >> Student workers are cheap and expendable... > > > > cold ... ;) > > but true. In fact, it as little as three or four months they tend to > expend, and there ya gotta go and train a new one. Sometim

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Jim Lux
Quoting Prentice Bisbal <[EMAIL PROTECTED]>, on Mon 12 May 2008 06:18:49 AM PDT: Jim Lux wrote: Actually, you can order cables already pre numbered and labelled. Why burn expensive cluster assembler time when you can pay someone (potentially offshore) to do it cheaper. Because that would

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Reuti
Hiho, Am 12.05.2008 um 15:14 schrieb Prentice Bisbal: At a previous job, I installed SGE for our cluster. At my current job Torque is the queuing system of choice. I'm very familar with SGE, but only have a cursory knowledge of Torque (installed it for evaluation, and that's it). We're abo

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Geoff Jacobs
Karen Shaeffer wrote: > Hi Geoff, > Following my line of thought, the issue is only whether the PCI rail(s) > powering all the IPMI circuitry are completely isolated from the > rest of the system board circuitry or not. If true, then it would > be equivalent to a daughterboard implementation. If fa

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Prentice Bisbal
Jim Lux wrote: > Actually, you can order cables already pre numbered and labelled. Why > burn expensive cluster assembler time when you can pay someone > (potentially offshore) to do it cheaper. Because that would hurt the US economy, and the labels would probably be made out of lead. ;-) -- Pr

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Prentice Bisbal
Reuti wrote: > Hi, > > Am 09.05.2008 um 20:26 schrieb Prentice Bisbal: > >> At a previous job, I installed SGE for our cluster. At my current job >> Torque is the queuing system of choice. I'm very familar with SGE, but >> only have a cursory knowledge of Torque (installed it for evaluation, >> a

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Prentice Bisbal
John Hearns wrote: > On Fri, 2008-05-09 at 14:26 -0400, Prentice Bisbal wrote: > >> 1. Interactive shells managed by queuing system >> 2. Counting licenses in use (done using a contributed shell script in SGE) >> 3. Separation of roles between submit hosts, execution hosts, and >> administration h

Re: [Beowulf] Do these SGE features exist in Torque?

2008-05-12 Thread Glen Beane
On May 9, 2008, at 2:26 PM, Prentice Bisbal wrote: At a previous job, I installed SGE for our cluster. At my current job Torque is the queuing system of choice. I'm very familar with SGE, but only have a cursory knowledge of Torque (installed it for evaluation, and that's it). We're about to pu

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Robert G. Brown
On Sun, 11 May 2008, Joe Landman wrote: Gerry Creager wrote: Student workers are cheap and expendable... cold ... ;) but true. In fact, it as little as three or four months they tend to expend, and there ya gotta go and train a new one. Sometimes they last as long as a couple of years, th

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Alan Louis Scheinine
This thread has also been used to discuss IPMI. We installed 1U cases in a rack and plugged-in the machines but they were turned-off. The air-conditioning installment was not finished, the rack doors were open but the lower part had many cables that basically covered the rear part of the nodes. T

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread stephen mulcahy
Greg Lindahl wrote: Last I saw someone doing this, IPMI sharing an ethernet port with the host led to all kinds of weird ARP problems. Whereas a dedicated port is much easier to configure. My favorite vendors all offer a dedicated port... Yes, same here - I've done some experimentation on a n

RE: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread Geoff Galitz
I was initially sceptical when I had to do the same thing as I had bad experiences with shared addressing over the same interface... but with the gear over the last five years I've never had a problem with this except with the specific combination of FreeBSD and Dell boxes. The problem lied on t

Re: [Beowulf] Re: Purdue Supercomputer

2008-05-12 Thread John Hearns
On Sun, 2008-05-11 at 19:13 -0700, Greg Lindahl wrote: > O > Last I saw someone doing this, IPMI sharing an ethernet port with the > host led to all kinds of weird ARP problems. Whereas a dedicated port > is much easier to configure. My favorite vendors all offer a > dedicated port... Also port nu