computers which we are going to buy. On the one hand there are the
standard tools to monitor a running cluster like ganglia, nagios,
zenoss, ... but these are - to my understanding - just for monitoring
the current status.
one problem with this is that it's naturally integrated with many
other things. for instance, the scheduler may want to have information on
the layout of nodes (to minimize the number of switches spanned by a
parallel job, for instance). similarly, I'd very much like to have my
syslog/eventlog correlatable through hardware to the actual jobs running
at the time.
in a sense, ideally all information would be integrated. but that would
require _everything_ to be standardized (your scheduler must be able to
read the same DB/tables as your event management system, etc). outside
of Redmond, I'm not sure how practical that is.
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf