Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 07:01 PM, mathog wrote: > How do they come up with the MTBF values for disks anyway? Clearly it > is not based on watching a large > sample of disks for countless years! I am not intimately familiar with how they come up with the values (else I probably would be at liberty to comme

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
> Only for benchmarking? We have done this for years on our production > clusters (and SGI provides a tool this and more to clean up nodes). We > have this in our epilogue so that we can clean out memory on our diskless > nodes so there is nothing stale sitting around that can impact the next > u

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Lux, Jim (337C)
Jim Lux -Original Message- From: beowulf-boun...@beowulf.org [mailto:beowulf-boun...@beowulf.org] On Behalf Of mathog Sent: Thursday, April 18, 2013 4:21 PM To: Alex Chekholko Cc: Beowulf List Subject: Re: [Beowulf] Are disk MTBF ratings at all useful? On 18-Apr-2013 16:03, Alex Chekh

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Geoffrey Jacobs
On 04/18/2013 06:40 PM, Joe Landman wrote: > Statistical analysis (called a bathtub analysis). MTBF's are WAGs at > best, and not well matched against empirical observation. Is anyone familiar with QC steps taken by hard drive manufacturers to eliminate "infant mortality" problems? Do any third

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Lux, Jim (337C)
You set up 1000 drives, run them at high temperature (using a scaling factor developed by experience) and count how many fail after some length of time, then extrapolate to a failure rate which gets turned into a MTBF. It *is* fairly scientific and based on sound principles, although there are

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread mathog
On 18-Apr-2013 16:03, Alex Chekholko wrote: > On Thu, Apr 18, 2013 at 4:01 PM, mathog wrote: >> How do they come up with the MTBF values for disks anyway? Clearly >> it >> is not based on watching a large >> sample of disks for countless years! > How would you do it? On a brand new design, I h

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Joe Landman
On 4/18/2013 7:01 PM, mathog wrote: > High end SATA and SAS disks claim MTBF values that work out to over 100 > years, and yet it is a common Amazing isn't it. Disks that never fail! > observation that certain models fail at rates entirely inconsistent > with those values. For instance, > 75% o

Re: [Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread Alex Chekholko
On Thu, Apr 18, 2013 at 4:01 PM, mathog wrote: > How do they come up with the MTBF values for disks anyway? Clearly it > is not based on watching a large > sample of disks for countless years! Hi David, How would you do it? Regards, Alex ___ Beowulf

[Beowulf] Are disk MTBF ratings at all useful?

2013-04-18 Thread mathog
High end SATA and SAS disks claim MTBF values that work out to over 100 years, and yet it is a common observation that certain models fail at rates entirely inconsistent with those values. For instance, 75% of all drives of one model dead in < 6 years. (Cited by one poster in this thread: htt

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Craig Tierney - NOAA Affiliate
Only for benchmarking? We have done this for years on our production clusters (and SGI provides a tool this and more to clean up nodes). We have this in our epilogue so that we can clean out memory on our diskless nodes so there is nothing stale sitting around that can impact the next users job.

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
On Thu, 18 Apr 2013, Mark Hahn wrote: >> What problems? > > performance, of course. drop_caches is really only sane for benchmarking, > where you want to control for hot/cold caches. Indeed. I thought you might know of harmful instances of which I was unaware. > otherwise, you're almost ce

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
> What problems? performance, of course. drop_caches is really only sane for benchmarking, where you want to control for hot/cold caches. otherwise, you're almost certainly better off either letting the kernel optimize global caching, and/or fix your application to avoid polluting the cache (O_

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
> "sudo sysctl -w vm.drop_caches=3" is the smarter way to do it, > or a fixed executable with sudo. or a fixed executable with suid. > or better yet: have the system do it when appropriate, > since inappropriate drop_caches could cause problems. What problems? http://linux-mm.org/Drop_Ca

[Beowulf] nVidia Kepler GK110 GPU is incompatible w/Intel x86 hardware in PCI-E 3.0 mode ?

2013-04-18 Thread Mikhail Kuzminsky
I've cluster node (w/Linux, of course) based on Supermicro X9SCA system board and Xeon E3-1230v2 having LGA1155 socket. Now I want to buy GPU nVidia Kepler GK110 w/PCI-E 3.0 (CK20 Compute Board from PNY ?) and install it into my node.  Intel Xeon E3-1230v2 and Supermicro X9SCA both support PCI-

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 02:55 PM, Joe Landman wrote: > [I am not BOFH ... I am not BOFH ... I am not BOFH ...] It should be against list policy for some of you somewhat more "experienced" guys to share insanely hilarious tomes of literature I managed to miss in the nineties. This is bound to suck down ma

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
> I term this article "fun with sudo, or how to drive down I95 at 65mph > while holding scissors transporting your x-ray device" :- -- Nicholas M. Glykos, Department of Molecular Biology and Genetics, Democritus University of Thrace, University Campus, Dragana, 68

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 02:45 PM, Adam DeConinck wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > > Tying in another recent discussion on the list, "root access" is > actually one of the places I've seen some success using Cloud for HPC. > It costs more, it's virtualized, and you usually can't ge

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Mark Hahn
> If you have looked into it, sudo echo 3 > /proc/sys/vm/drop_caches is well > nigh impossible. > But you can run an suid C program which does effectively the same job. "sudo sysctl -w vm.drop_caches=3" is the smarter way to do it, or a fixed executable with sudo. or a fixed executable with su

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 02:40 PM, James Cuff wrote: > > > > On Thu, Apr 18, 2013 at 2:35 PM, Joe > Landman > wrote: > > > > landman@metal:~$ sudo echo 3 > /proc/sys/vm/drop_caches > bash: /proc/sy

Re: [Beowulf] Definition of HPC

2013-04-18 Thread James Cuff
On Thu, Apr 18, 2013 at 2:35 PM, Joe Landman wrote: > > > landman@metal:~$ sudo echo 3 > /proc/sys/vm/drop_caches > bash: /proc/sys/vm/drop_caches: Permission denied > > # ??!? > Cute write up: http://stackoverflow.com/questions/82

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Adam DeConinck
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tying in another recent discussion on the list, "root access" is actually one of the places I've seen some success using Cloud for HPC. It costs more, it's virtualized, and you usually can't get HPC-specialized hardware, so it's obviously not a silve

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 02:35 PM, Joe Landman wrote: > landman@metal:~$ sudo ./drop_caches.bash > [sudo] password for landman: > landman@metal:~$ > > # PROFIT!!! This is exactly how I do things. I've got a whole folder of "special scripts" that require su. But getting the script developed usually requir

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 01:07 PM, Hearns, John wrote: > As an aside, a normal user can trigger a drop of the caches before the start > of a job. > If you have looked into it, sudo echo 3 > /proc/sys/vm/drop_caches is well > nigh impossible. > But you can run an suid C program which does effectively the

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
> Lets say ... ah ... Safety is very important. Pretending it isn't, > or saying "bad things can't happen to me because I is smart" isn't quite > a safe strategy ... for computing, for x-ray interlocks, for driving on > a highway ... So, following your "logic," I take it you don't drive on

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 12:57 PM, Nicholas M Glykos wrote: >> Please describe what the grad student who pioneers new tech for X-ray >> generators should do, particularly if they are trying to develop new safety >> interlocks. > They work on their supervisor's own X-ray generator at the basement of > their de

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Hearns, John
>> I run my X-ray generator with all safety interlocks off - >> personal responsibility are my watchwords. >Does your radiation safety officer knows that ? >;-) First running down stairs with scissors, now taking the interlocks off Xray sources. Who knew that Beowulfery really was a sport for a

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Hearns, John
> cluster and build my own storage at home, so I can do research without > constantly having day or more delays in trying to flush a cache or do > something similarly simple but requiring of root. As an aside, a normal user can trigger a drop of the caches before the start of a job. If you hav

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 12:57 PM, Nicholas M Glykos wrote: > > > >>> In the same way you wouldn't allow a general >>> user to override the safety interlocks of an X-ray generator, you >>> shouldn't allow root access to the general users of a shared computing >>> facility. >> >> Please describe what the grad

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
> > In the same way you wouldn't allow a general > > user to override the safety interlocks of an X-ray generator, you > > shouldn't allow root access to the general users of a shared computing > > facility. > > Please describe what the grad student who pioneers new tech for X-ray > generators

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
>> It seems that some who decry the "nanny state" feel less >> libertarian when it comes to cluster management and use. > > This is not about clusters. This is about dependable, responsible and This is a discussion in Beowulf.org - how could it not be about clusters? > professional use of expensi

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
> I run my X-ray generator with all safety interlocks off - > personal responsibility are my watchwords. Does your radiation safety officer knows that ? ;-) > It seems that some who decry the "nanny state" feel less > libertarian when it comes to cluster management and use. This is not about

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 12:21 PM, Nicholas M Glykos wrote: > In the same way you wouldn't allow a general > user to override the safety interlocks of an X-ray generator, you > shouldn't allow root access to the general users of a shared computing > facility. Please describe what the grad student who pioneer

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Max R. Dechantsreiter
On Thu, 18 Apr 2013, Nicholas M Glykos wrote: > > >>> Running as root? Yeah, its that bad. Just say no. >> >> Are you setting yourself up as arbiter of who should and >> who should not run as root? Please - respect those of us >> who have the capabilities, experience, and juice to do so >>

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 4/17/2013 5:32 PM, Max R. Dechantsreiter wrote: [...] >> Running as root? Yeah, its that bad. Just say no. > > Are you setting yourself up as arbiter of who should and > who should not run as root? Please - respect those of us > who have the capabilities, experience, and juice to do so > (wh

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Nicholas M Glykos
> > Running as root? Yeah, its that bad. Just say no. > > Are you setting yourself up as arbiter of who should and > who should not run as root? Please - respect those of us > who have the capabilities, experience, and juice to do so > (when cirumstances demand it). I think that what both J

Re: [Beowulf] Register article on Linux State of the Union

2013-04-18 Thread Joshua Mora
Search on the web for instance "PGAS over Ethernet" to get an idea of where _some_ of those things are headed. Joshua -- Original Message -- Received: 05:50 PM CEST, 04/18/2013 From: "Douglas Eadline" To: "Hearns, John" Cc: "beowulf@beowulf.org" Subject: Re: [Beowulf] Register article o

Re: [Beowulf] Register article on Linux State of the Union

2013-04-18 Thread Hearns, John
> I am puzzled by this paragraph. I don't follow the kernel development list > anymore, but is there some kind of effort to support shared memory > over Ethernet? I think "degrade rapidly" is a bit of an understatement. I took it that the author of the piece hasn't fully understood. Probably is r

Re: [Beowulf] Register article on Linux State of the Union

2013-04-18 Thread Douglas Eadline
> http://www.theregister.co.uk/2013/04/17/state_of_linux_2013/ > > Some interesting points from an HPC point of view: > > > "For example, for some kinds of workloads, the NUMA (non-uniform memory > access) problem is all-important. This is particularly true of distributed > application clusters. A

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 11:09 AM, Ellis H. Wilson III wrote: [...] >> I am guessing your work doesn't involve a great deal of support. > Support in grad school? Nope. Not really. Not to do what I need to > do, at least. If you are referring to if I have to support someone, > well, that's also a big no.

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/18/2013 10:52 AM, Joe Landman wrote: > On 04/18/2013 10:37 AM, Ellis H. Wilson III wrote: > > [...] >> Please note: I NEVER run as root, I just "tinker" as root. I don't >> think there is ever a good reason to run as root. But having and using >> root is not so evil as you claim. In partic

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Joe Landman
On 04/18/2013 10:37 AM, Ellis H. Wilson III wrote: [...] > Please note: I NEVER run as root, I just "tinker" as root. I don't > think there is ever a good reason to run as root. But having and using > root is not so evil as you claim. In particular, I have NO doubt you > require root to build J

Re: [Beowulf] Definition of HPC

2013-04-18 Thread Ellis H. Wilson III
On 04/17/2013 12:56 PM, Joe Landman wrote: > Without naming names ... we had a cluster we had set up several years > ago, with a particular cluster distribution compromised by an errant > graduate student running windows on a compromised laptop. They couldn't > break into the cluster, so they insta

Re: [Beowulf] Is comaring HPC to Formula1 a bad idea?

2013-04-18 Thread Mark Hahn
> http://www.isc-events.com/isc13/isc_blog/items/is-comparing-hpc-to-formula1-a-bad-idea.html I think there probably is an F1-like HPC subculture, though I would argue that it's relatively fringe. the top fringe, of course, but fringe none the less. mostly bespoke hardware - even in cases where

[Beowulf] Is comaring HPC to Formula1 a bad idea?

2013-04-18 Thread Hearns, John
http://www.isc-events.com/isc13/isc_blog/items/is-comparing-hpc-to-formula1-a-bad-idea.html Thankyou Andrew! "It combines the rapidly evolving hardware (the car is aggressively innovated throughout the race season) with a collaborations of many different high end skills (e.g., driver, pit crew