What sort of business management level metrics do people measure on clusters? Upper management is asking for us to define and provide some sort of "numbers" which can be used to gage the success of our cluster project.
We currently have both SGE and Torque/Moab in use and need to measure both if possible. I can think of some simple metrics (well sort-of, actual technical definition/measurement may be difficult): - 90/95th percentile wait time for jobs in various queues. Is smaller better meaning the jobs don't wait long and users are happy? Is larger better meaning that we have lots of demand and need more resources? - core-hours of user computation (per queue?) both as raw time and percentage of available time. Again, which is better (management view) higher or lower? - Availability during scheduled hours (ignoring scheduled maintenance times). Common metric, but how do people actually measure/compute this? What about down nodes? Some scheduled percentage (5%?) assumed down? - Number of new science projects performed. Vague, but our applications support people can just count things occasionally. Misses users who just use the system without interaction with us. Misses "production" work that just keeps running. Any comments or ideas are welcome. Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf