I need to think a bit more about how to handle this one, but the current behavior doesn't seem right.
On Mon, Feb 7, 2011 at 10:30 PM, Stuart Henderson <s...@spacehopper.org> wrote: > I'm trying to get relayd to send pings with a different source > address so I can combine them with "route-to" pf rules so I can > test a remote host via several network paths to see if it's alive. > As my attempts to modify "check icmp" to let me set the source > address have so far failed, I'm trying to use "check script" > to let me do this instead. > > check_script.c:script_exec() causes relayd to fork, set an > interval timer to fire after a timeout, and exec a program to > check host liveliness. fwiw, I'm using a shell script like this: > > myif=$(route -q -n get $1 | awk '/if addr/ {print $3}') > ping -c1 -w1 -I $myif somehost.example.com > /dev/null && exit 1 > > If the program exit()s with a non-zero return code, this > indicates success. If it exit()s with a zero return code, > this is taken to indicate failure. (Other way round than > I'd expect, but I can live with that). > > If the script does not exit() (which is checked with WIFEXITED), > this is taken to indicate a problem with the checking script only; > the host status is marked as "unknown" and the host is not treated > as if it was down. > > If the interval timer fires, the script is sent SIGKILL, so of > course does not get to exit(), so the host status is marked > unknown. > > I can always increase relayd's timeout to give my script chance to > exit normally and indicate failure, but it seems counter-intuitive > a timeout should be taken as "unknown" in "check script" rather > than the "host down" it would cause with the other check methods. > > So two questions. > > - Is there any point in treating a check script that doesn't > exit() as 'unknown'? If it's ok to change this so that it indicates > 'down' instead, the diff below helps. > > - If we can't change this, can anyone suggest a (simple|safe) > way to indicate from the timer (sigalrm handler) that the check > script was killed due to a timeout rather than some other reason, > so we can differentiate between them? > > Index: check_script.c > =================================================================== > RCS file: /cvs/src/usr.sbin/relayd/check_script.c,v > retrieving revision 1.10 > diff -u -p -r1.10 check_script.c > --- check_script.c 5 Jun 2009 23:39:51 -0000 1.10 > +++ check_script.c 7 Feb 2011 21:20:49 -0000 > @@ -156,7 +156,7 @@ script_exec(struct relayd *env, struct c > if (WIFEXITED(status)) > ret = WEXITSTATUS(status); > else > - ret = -1; > + ret = 0; > } > > done: