I couldn't have said it better myself. Be wary of suits asking for asking for numbers.
Michael Di Domenico wrote: > I think measuring a clusters success based on the number of jobs run > or cpu's used is a bad measure of true success. I would be more > inclined to consider a cluster a success by speaking with the people > who use it and find out not only whether they can use it effectively > and/or what new science having cluster is being enabled by them. > > then only thing i find most of the below metrics overly useful is > figuring out whether or not we need a bigger cluster. which i guess > is a form of measurable success, but not one in which i would consider > the "cluster" to be a success. it could just be dopes running > thousands of "/bin/hostname" jobs trying to figure out how to use the > cluster > > I also think you need to ask the "business" people what measure they > would consider a cluster as a worthwhile investment, it doesn't sound > as if you have that from your email. > > > > On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley <stua...@4gh.net> wrote: >> What sort of business management level metrics do people measure on >> clusters? Upper management is asking for us to define and provide >> some sort of "numbers" which can be used to gage the success of our >> cluster project. >> >> We currently have both SGE and Torque/Moab in use and need to measure >> both if possible. >> >> I can think of some simple metrics (well sort-of, actual technical >> definition/measurement may be difficult): >> >> - 90/95th percentile wait time for jobs in various queues. Is smaller >> better meaning the jobs don't wait long and users are happy? Is >> larger better meaning that we have lots of demand and need more >> resources? >> >> - core-hours of user computation (per queue?) both as raw time and >> percentage of available time. Again, which is better (management >> view) higher or lower? >> >> - Availability during scheduled hours (ignoring scheduled maintenance >> times). Common metric, but how do people actually measure/compute >> this? What about down nodes? Some scheduled percentage (5%?) assumed >> down? >> >> - Number of new science projects performed. Vague, but our >> applications support people can just count things occasionally. >> Misses users who just use the system without interaction with us. >> Misses "production" work that just keeps running. >> >> Any comments or ideas are welcome. >> >> Thanks, >> Stuart Barkley >> -- >> I've never been lost; I was once bewildered for three days, but never lost! >> -- Daniel Boone >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf