RE: [Beowulf] Software RAID?

2007-11-26 Thread Mark Hahn
Of course there are a zillion things you didn't mention. How many drives did you want to use? What kind? (SAS? SATA?) If you want 16 drives often you get hardware RAID hardware even if you don't use it. What config did you want? Raid-0? 1? 5? 6? Filesystem? So let's say it's 16. But in theory

Re: [Beowulf] Software RAID?

2007-11-26 Thread Joe Landman
Ekechi Nwokah wrote: Reposting with (hopefully) more readable formatting. [...] Of course there are a zillion things you didn't mention. How many drives did you want to use? What kind? (SAS? SATA?) If you want 16 drives often you get hardware RAID hardware even if you don't use it. What

RE: [Beowulf] Software RAID?

2007-11-26 Thread Ekechi Nwokah
Reposting with (hopefully) more readable formatting. * Sorry I was not quite clear. See below. > -Original Message- > From: Bill Broadley [mailto:[EMAIL PROTECTED] > Sent: Wednesday, November 21, 2007 4:35 PM > To: Ekechi Nwokah > Cc: Beowulf Mailing List > Subject: Re:

RE: [Beowulf] Software RAID?

2007-11-26 Thread Ekechi Nwokah
Sorry I was not quite clear. See below. -Original Message- From: Bill Broadley [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 21, 2007 4:35 PM To: Ekechi Nwokah Cc: Beowulf Mailing List Subject: Re: [Beowulf] Software RAID? Ekechi Nwokah wrote: > Hi, > > Does anyone know of any so

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Jim Lux
At 01:15 PM 11/26/2007, Bruno Coutinho wrote: I heard that the major source of memory corruption in servers is the memory bus. And this becomes worse as you add memory sticks. With 8 memory stics that have 8 chips in both sides, you has 128 chips. So the main purpose of ECC is correcting bus er

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Tony Travis
David Mathog wrote: > I ran a little test over the Thanksgiving holiday to see how common > random errors in nonECC memory are. I used the memtest86+ bit fade test > mode, which writes all 1s, waits 90 minutes, checks the result, then > does the same thing for all 0s. Anyway, this was the best

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Greg Lindahl
On Mon, Nov 26, 2007 at 12:27:03PM -0800, David Mathog wrote: > Anyway, this was the best test I could > find for detecting the occasional gamma ray type data loss event. I always thought that the bit fade test was aimed at finding manufacturing defects and the like. > On the web there are refe

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Bruno Coutinho
I heard that the major source of memory corruption in servers is the memory bus. And this becomes worse as you add memory sticks. With 8 memory stics that have 8 chips in both sides, you has 128 chips. So the main purpose of ECC is correcting bus errors. 2007/11/26, David Mathog <[EMAIL PROTECTE

Re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread Scott Atchley
On Nov 26, 2007, at 3:27 PM, David Mathog wrote: I ran a little test over the Thanksgiving holiday to see how common random errors in nonECC memory are. I used the memtest86+ bit fade test mode, which writes all 1s, waits 90 minutes, checks the result, then does the same thing for all 0s.

re: [Beowulf] Not quite Walmart, or, living without ECC?

2007-11-26 Thread David Mathog
I ran a little test over the Thanksgiving holiday to see how common random errors in nonECC memory are. I used the memtest86+ bit fade test mode, which writes all 1s, waits 90 minutes, checks the result, then does the same thing for all 0s. Anyway, this was the best test I could find for detecti

RE: [Beowulf] Computational Astronomy?

2007-11-26 Thread Donald Shillady
While I am considering building a microWulf to run GAMESS on small molecules it seems to me that there are many small molecules in intersteller space and the (parallel) GAMESS program can do pretty accurate calculations of molecular vibrations. Although H2 is probably the most common molecule

Re: [Beowulf] Tips for diagnosing intermittent problems on a small cluster

2007-11-26 Thread Peter St. John
David, clarification understood, thanks. I sometimes have problems with a desktop; I reboot (because of memory leaks) and have to shutdown because the mobo refuses to restart (seemingly because of temp) but a couple minutes cooldown does the trick. Peter On Nov 26, 2007 12:58 PM, David Mathog <[E

Re: [Beowulf] Tips for diagnosing intermittent problems on a small cluster

2007-11-26 Thread David Mathog
"Peter St. John" <[EMAIL PROTECTED]> wrote > I understood that sometimes the voltage from a fatigued (?), > overheated (?) PS may fail the mobo's bootup requirements (which can > be stricter re: voltage variations than running requirements) so > sometimes a PS has to cool down before the PC will r

Re: [Beowulf] Tips for diagnosing intermittent problems on a small cluster

2007-11-26 Thread Peter St. John
I understood that sometimes the voltage from a fatigued (?), overheated (?) PS may fail the mobo's bootup requirements (which can be stricter re: voltage variations than running requirements) so sometimes a PS has to cool down before the PC will reboot. So particularly, sometimes a PC failing to re

[Beowulf] Computational Astronomy?

2007-11-26 Thread Huntress Gary B NPRI
Hi Everyone, I have a small but functional 12 node cluster that unfortunately has outlived its usefulness. It was purchased for a particular project, served me well and now spends it's life powered down, which is a shame. Since it is still perfectly functional, I could easily give it another

Re: [Beowulf] Tips for diagnosing intermittent problems on a small cluster

2007-11-26 Thread David Mathog
[EMAIL PROTECTED] (Andrew M.A. Cater) wrote > > There _may_ be some PSU involvement with ours: the machine and fans are > running but not accepting connections. You have to disconnect the power > for a few minutes for it to even boot again properly. Powercycling from > the front panel doesn't

Re: [Beowulf] and the winner of "Best 2007 parallel file system" is...

2007-11-26 Thread Robert Latham
On Fri, Nov 23, 2007 at 06:41:52PM +, andrew holway wrote: > Scores for redundancy, reliability and value. Nice features etc. Id > like to build a decent comparison of all the options available cos > honestly, I don't have a clue :) and all the data online seems a > little stale. Do you have i

Re: [Beowulf] I/O workload of an application in distributed file system

2007-11-26 Thread Robert Latham
On Thu, Nov 22, 2007 at 10:15:25AM -0500, Mark Hahn wrote: > with that in mind, my opinion is that cluster IO testing should be > a combination of: > - parallel streaming IO to separate files - resembling a checkpoint, > or an IO-intensive app reading, or an app where the user forgot to

Re: [Beowulf] Software RAID?

2007-11-26 Thread stephen mulcahy
Joe Landman wrote: stephen mulcahy wrote: I'd note that battery backed caches on the RAID cards I've looked at in the last few months seem to come at a *significant* premium over the base price of the card (50-100% type significant). Hmmm those must be either really expensive batteries

Re: [Beowulf] Tips for diagnosing intermittent problems on a small cluster

2007-11-26 Thread stephen mulcahy
Andrew M.A. Cater wrote: Same here with on a single machine with an earlier model Tyan board - it happened to us either after a very occasional kernel panic/exception or after 25-28 days of continuous running. I've got a 2885 here, if I can just find two Opterons, memory and a case :-) I'll le

Re: [Beowulf] Teaching Scientific Computation (looking fo the perfect text)

2007-11-26 Thread Lombard, David N
On Wed, Nov 21, 2007 at 07:23:32PM -0500, Robert G. Brown wrote: > On Wed, 21 Nov 2007, David Mathog wrote: > > >It would have been interesting if C++ had implemented operator > >definition instead of operator overloading (ie, redefinition). > >For instance like: > > > >A = B + C D > > > > (one