On Wed, 26 Nov 2008, Thomas Vixel wrote: > I've been googling for a top-like cli tool to use on our cluster, but > the closest thing that comes up is Rocks' "cluster top" script. That > could be tweaked to work via the cli, but due to factors beyond my > control (management) all functionality has to come from a pre-fab > program rather than a software stack with local, custom modifications. > > I'm sure this has come up more than once in the HPC sector as well -- > could anyone point me to any top-like apps for our cluster?
Most remote job mechanisms only think about starting remote processes, not about the full create-monitor-control-report functionality. The Scyld system (currently branded "Clusterware") defaults to using a built-in unified process space. That presents all of the processes running over the cluster in a process space on the master machine, with fully POSIX semantics. It neatly solves your need with... the standard 'top' program. Most scheduling systems also have a way to monitor processes that they start, but I haven't found one that takes advantage of all information available and reports it quickly/efficiently. There are many advantages of the Scyld implementation -- no new or modified process management tools need to be written. Standard utilities such as 'top' and 'ps' work unmodified, as well as tools we didn't specifically plan for e.g. GUI versions of 'pstree'. -- The 'killall' program works over the cluster, efficiently. -- All signals work as expected, including 'kill -9'. (Most remote process starting mechanisms will just kill off the local endpoint, leaving the remote process running-but-confused.) -- Process groups and controlling-TTY groups works properly for job control and signals -- Running jobs report their status and statistics accurately -- an updated 'rusage' structure is sent once per second, and a final rusage structure and exit status is sent when the process terminates. The "downside" is that we explicitly use Linux features and details, relying on kernel-version-specific features. That's an issue if it's a one-off hack, but we've been using this approach continuously for a decade, since the Linux 2.2 kernel and over multiple architectures. We've been producing supported commercial releases since 2000, longer than anyone else in the business. -- Donald Becker [EMAIL PROTECTED] Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf