It's fairly easy to accidentally configure relayd to try to run check scripts
faster than they finish, for example if you have a check interval of one
second and the check script makes a tcp connection to a host that doesn't
exist any more.

In this situation, the hce process will keep writing messages to its imsg
buffer to the parent process asking it to run checks, which causes its memory
usage to grow without bounds.  If the check script starts working again
(or if you change it to just 'exit 0') the parent works its way through the
backlog and memory usage goes back to normal, but ideally relayd would avoid
doing this to itself.

If we don't clear the F_CHECK_SENT and F_CHECK_DONE flags in
hce_launch_checks(), check_script() can use them to figure out if the
last check request it sent for the host has finished yet, so it can avoid
building up a backlog of work for the parent.  The ICMP and script check 
implementations clear these flags as they start checks, and the TCP check
code doesn't use them at all, so this shouldn't affect anything else.

ok?


Index: check_script.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/check_script.c,v
retrieving revision 1.21
diff -u -p -u -p -r1.21 check_script.c
--- check_script.c      28 May 2017 10:39:15 -0000      1.21
+++ check_script.c      15 Feb 2021 01:28:54 -0000
@@ -38,6 +38,9 @@ check_script(struct relayd *env, struct 
        struct ctl_script        scr;
        struct table            *table;
 
+       if ((host->flags & (F_CHECK_SENT|F_CHECK_DONE)) == F_CHECK_SENT)
+               return;
+
        if ((table = table_find(env, host->conf.tableid)) == NULL)
                fatalx("%s: invalid table id", __func__);
 
@@ -52,7 +55,9 @@ check_script(struct relayd *env, struct 
                fatalx("invalid script path");
        memcpy(&scr.timeout, &table->conf.timeout, sizeof(scr.timeout));
 
-       proc_compose(env->sc_ps, PROC_PARENT, IMSG_SCRIPT, &scr, sizeof(scr));
+       if (proc_compose(env->sc_ps, PROC_PARENT, IMSG_SCRIPT, &scr,
+           sizeof(scr)) == 0)
+               host->flags |= F_CHECK_SENT;
 }
 
 void
Index: hce.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/hce.c,v
retrieving revision 1.79
diff -u -p -u -p -r1.79 hce.c
--- hce.c       6 Aug 2018 17:31:31 -0000       1.79
+++ hce.c       15 Feb 2021 01:28:54 -0000
@@ -139,7 +139,6 @@ hce_launch_checks(int fd, short event, v
                TAILQ_FOREACH(host, &table->hosts, entry) {
                        if ((host->flags & F_CHECK_DONE) == 0)
                                host->he = HCE_INTERVAL_TIMEOUT;
-                       host->flags &= ~(F_CHECK_SENT|F_CHECK_DONE);
                        if (event_initialized(&host->cte.ev)) {
                                event_del(&host->cte.ev);
                                close(host->cte.s);

Reply via email to