I need to think a bit more about how to handle this one, but the
current behavior doesn't seem right.

On Mon, Feb 7, 2011 at 10:30 PM, Stuart Henderson <s...@spacehopper.org>
wrote:
> I'm trying to get relayd to send pings with a different source
> address so I can combine them with "route-to" pf rules so I can
> test a remote host via several network paths to see if it's alive.
> As my attempts to modify "check icmp" to let me set the source
> address have so far failed, I'm trying to use "check script"
> to let me do this instead.
>
> check_script.c:script_exec() causes relayd to fork, set an
> interval timer to fire after a timeout, and exec a program to
> check host liveliness. fwiw, I'm using a shell script like this:
>
> myif=$(route -q -n get $1 | awk '/if addr/ {print $3}')
> ping -c1 -w1 -I $myif somehost.example.com > /dev/null && exit 1
>
> If the program exit()s with a non-zero return code, this
> indicates success. If it exit()s with a zero return code,
> this is taken to indicate failure. (Other way round than
> I'd expect, but I can live with that).
>
> If the script does not exit() (which is checked with WIFEXITED),
> this is taken to indicate a problem with the checking script only;
> the host status is marked as "unknown" and the host is not treated
> as if it was down.
>
> If the interval timer fires, the script is sent SIGKILL, so of
> course does not get to exit(), so the host status is marked
> unknown.
>
> I can always increase relayd's timeout to give my script chance to
> exit normally and indicate failure, but it seems counter-intuitive
> a timeout should be taken as "unknown" in "check script" rather
> than the "host down" it would cause with the other check methods.
>
> So two questions.
>
> - Is there any point in treating a check script that doesn't
> exit() as 'unknown'? If it's ok to change this so that it indicates
> 'down' instead, the diff below helps.
>
> - If we can't change this, can anyone suggest a (simple|safe)
> way to indicate from the timer (sigalrm handler) that the check
> script was killed due to a timeout rather than some other reason,
> so we can differentiate between them?
>
> Index: check_script.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/relayd/check_script.c,v
> retrieving revision 1.10
> diff -u -p -r1.10 check_script.c
> --- check_script.c      5 Jun 2009 23:39:51 -0000       1.10
> +++ check_script.c      7 Feb 2011 21:20:49 -0000
> @@ -156,7 +156,7 @@ script_exec(struct relayd *env, struct c
>                if (WIFEXITED(status))
>                        ret = WEXITSTATUS(status);
>                else
> -                       ret = -1;
> +                       ret = 0;
>        }
>
>  done:

Reply via email to