First, a disclaimer: my test is not by the book and I know that - feel free to yell at me. The production machine I'd like to run nagios on is still on 3.6-stable. What I did was not the nicest thing in the world but I created the _nagios user and group using the info from PLIST, ripped the two lines out of PLIST and then the port built on 3.6 OK.
What I noticed is that sometimes (approx. 15% of attempts) the plugins do not return any output, causing alerts. [02-08-2005 13:20:53] HOST ALERT: Lili;DOWN;SOFT;1;(No output!) [02-08-2005 12:17:58] SERVICE ALERT: Lili;Ping;CRITICAL;SOFT;1;(Return code of 139 is out of bounds) Where lili is localhost and obviously not down. I searched around and found this (http://nagios.sourceforge.net/docs/2_0/whatsnew.html ): "There are a few known issues with the Nagios 2.0 code at the moment. Hopefully some of these will be fixed before 2.0 is released as stable... 1. FreeBSD and threads. On FreeBSD there's a native user-level implementation of threads called 'pthread' and there's also an optional ports collection 'linuxthreads' that uses kernel hooks. Some folks from Yahoo! have reported that using the pthread library causes Nagios to pause under heavy I/O load, causing some service check results to be lost. Switching to linuxthreads seems to help this problem, but not fix it. The lock happens in liblthread's __pthread_acquire() - it can't ever acquire the spinlock. It happens when the main thread forks to execute an active check. On the second fork to create the grandchild, the grandchild is created by fork, but never returns from liblthread's fork wrapper, because it's stuck in __pthread_acquire(). Maybe some FreeBSD users can help out with this problem." My question is: is anybody else experiencing similar problems? Is this caused by running nagios on non -current box? Could this be the same pthread issue as with FreeBSD? I'm setting up a -current box at the moment to replicate the test on it. Regards, Mitja