I recently started messing with SNMP, and I found that attempting to
get it to do active monitoring via snmpd results in a segfault. could
be my box, could be that it's a new implementation, could be bad mojo.
in any event, I decided I'd reached the point of diminishing returns
via troubleshooting, so I went ahead and wrote up a script to do what
I wanted snmpd to do. Namely, monitor the processes in the snmpd.conf
file, and email me trap notifications if there are any issues.
<pre>
#!/usr/local/bin/bash
# procmon - given an snmpd.conf file, this script monitors
# given "proc" directives for process status since
# snmpd segfaults like a whiny bitch when you ask it
# to do active process monitoring. Iggdawg dislikes
# whiny bitches. Notifies via snmptrap.
COUNT=1
CONFIG="/etc/snmp/snmpd.conf"
for name in $(cat $CONFIG | grep "proc " | awk '{print $2}')
do
if [ -e /tmp/proc.$COUNT ] ; then
if [ $(/usr/local/bin/snmpwalk -v 1 -c localpass localhost
1.3.6.1.4.1.2021.2.1.100."$COUNT" | awk '{ print $4 }') = 0 ] ; then
/usr/local/bin/snmptrap -v 2c -c localpass localhost ''
1.3.6.1.4.1.2021.2.1.101.$COUNT UCD-SNMP-MIB::prErrMessage.$COUNT s
"$(echo "Trouble with $name has cleared")"
rm /tmp/proc.$COUNT
else
COUNT=$(( $COUNT + 1 ))
fi
else
if [ $(/usr/local/bin/snmpwalk -v 1 -c localpass localhost
1.3.6.1.4.1.2021.2.1.100."$COUNT" | awk '{ print $4 }') != 0 ] ; then
touch /tmp/proc.$COUNT
echo "$name has issues. Here's the skinny:
$(/usr/local/bin/snmpwalk -v 1 -c localpass localhost
1.3.6.1.4.1.2021.2.1.101.$COUNT)" > /tmp/proc.$COUNT
/usr/local/bin/snmptrap -v 2c -c localpass localhost ''
1.3.6.1.4.1.2021.2.1.101.$COUNT UCD-SNMP-MIB::prErrMessage.$COUNT s
"$(cat /tmp/proc.$COUNT)"
COUNT=$(( $COUNT + 1 ))
else
COUNT=$(( $COUNT + 1 ))
fi
fi
done
</pre>
Change to the shell of your choice. I chose bash, that's how I roll.
I'll explain a little for those less experienced with scripts. It
reads off your existing snmpd.conf file and reports when a process
goes down. It makes a lockfile so it won't report more than once.
Every time the script is run, it checks to see if any previously
reported problems have been resolved. it notifies and kills the
lockfile if this is the case. I run this script as */5 * * * * in
cron.
So far as I know, this supports MIN/MAX flags on the proc directive in
the config file. I tested it by killing my internal ftp-proxy process
bound to localhost:8021 triggering the "proc ftp-proxy 2 2" line. the
trap showed up in my email, and reported to snmptrapd via its existing
config file. Basically you can drop this in and have it "just work"
if you have snmpd and snmptrapd set up and running as proper (emailing
restart events at least, for instance), respecting only path changes
depending on your setup. Presumably this would script would expand
well to cover any of the other types of monitor OIDs with little
hassle.
PS - writing an email in a normal editor after a couple hours in vi
produces more "x"s than las vegas.