Re: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

2008-12-25 Thread Alex Younts
At my employer, we use a variety of monitoring tools for our various clusters. Our nagios box is a VM with a single processor and 512MB of memory. Currently, we monitor 1700 hosts, each with three or four service checks a piece (two of which SSH to nodes to run scripts). We check services about eve

Re: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

2008-12-25 Thread Alex Younts
We have quite a few different PBS servers running PBSPro 9.x. Our Nagios box has a bare install of the PBSPro and we wrote a check script that runs "pbsnodes -s $cluster-head-node $nodehostname" and checks to see if PBS thinks the node is happy. (We determine which PBS server to hit up based on the

Re: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

2008-12-23 Thread Rahul Nabar
On Mon, Dec 22, 2008 at 10:23 PM, Alex Younts wrote: > At my employer, we use a variety of monitoring tools for our various > clusters. Our nagios box is a VM with a single processor and 512MB of > memory. Currently, we monitor 1700 hosts, each with three or four > service checks a piece (two of w

Re: [Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

2008-12-23 Thread John Hearns
2008/12/23 Rahul Nabar > > I'd like (2) because it is more secure and seems easier to deploy but > I'm a bit afraid if this will overtax my central server. > > Any suggestions? Are other users using Nagios here? > Rahul, I'm not a Nagios expert, but I have used NRPE for monitoring quite some time

[Beowulf] using Nagios to monitor compute nodes: NPRE vs check_by_ssh

2008-12-22 Thread Rahul Nabar
I just installed Nagios to try and monitor my 256 compute nodes centrally. It seems to work like a charm for all the public services (ping, ssh etc.) but now I was getting more ambitious and wanted to try to monitor the private services too (disk usage; process loads; torque ; pbs etc.). I was jus