Kebert Martin wrote:
> Hi,
> I tried the patch "p1_and_2.diff" from #973052.
> I'm not saying it was extensive test, but 7 minutes after start I got first
> crash:
> Oct 28 17:35:26 debian systemd[1]: Started Unbound DNS server.
> Oct 28 17:35:26 debian unbound[450]: [450:0] info: start of service (unbound
> 1.9.0).
> ...
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Main process exited, code=
> killed, status=11/SEGV
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Failed with result
> 'signal'.
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Service RestartSec=100ms
> expired, scheduling restart.
> Oct 28 17:42:26 debian systemd[1]: unbound.service: Scheduled restart job,
> restart counter is at 1.
> ...
> and 10 minutes later flood (about 30/sec) of these messages:
> ...
> Oct 28 17:52:49 debian unbound[1885]: [warn] Epoll ADD(1) on fd 52 failed. Old
> events were 0; read change was 1 (add); w
> rite change was 0 (none); close change was 0 (none): Bad file descriptor
> Oct 28 17:52:49 debian unbound[1885]: [1885:3] error: read (in tcp s): Bad 
> file
> descriptor for <Client_IP> port <Client_Port>
> ...
> 
> and "unbound" stopped responding to "unbound-control" (even simple
> "unbound-control status" hangs).
> I can't decide whether it was caused by this patch or whether it is someting
> different.
> Anyway I installed version 1.10 back which works.

Hi, Kebert:

Instead of the "p1_and_2.diff" patch, can you try the attached patch
which includes additional fixes recommended by upstream? If this works
for you we can propose updating the version of unbound in buster with
these fixes.

Thanks!

-- 
Robert Edmonds
edmo...@debian.org
>From 0bf0258a54b9e7fd7d596bed3412bbf12ba532b6 Mon Sep 17 00:00:00 2001
From: Robert Edmonds <edmo...@debian.org>
Date: Wed, 28 Oct 2020 13:36:17 -0400
Subject: [PATCH] Apply a series of fixes for Unbound 1.9.0 suggested by
 upstream

Per https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4227#c8,
upstream recommends applying the following commits against 1.9.0:

348cbab016f824a336b65d0091310fe5cd58e762
2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f
0b77c9d6763686264d44dfd926c8cb4f2f03a43a
6067ce6d2b82ce2e80055e578fdfd7ba3e67c523
af6c5dea43fc63452d49b2339e607365b6652987
a08fe8ca609b651c8d8c8379780aad508d492421

However, commit 0b77c9d6763686264d44dfd926c8cb4f2f03a43a contains a
complete revert of the code changes in
cae8361dcd2809c8e266d259370c9ab8660c2c0e (added post-1.9.0), so I
applied that patch as well in order to avoid needing to manually resolve
the textual conflict when attempting to apply
0b77c9d6763686264d44dfd926c8cb4f2f03a43a to 1.9.0.

Most hunks applied cleanly or with a small offset, excluding the
changelog entries. The git-apply session was as follows:

    $ git describe
    debian/1.9.0-2+deb10u2

    $ git apply --verbose --exclude=doc/Changelog \
        /tmp/up/348cbab016f824a336b65d0091310fe5cd58e762.diff \
        /tmp/up/2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f.diff \
        /tmp/up/cae8361dcd2809c8e266d259370c9ab8660c2c0e.diff \
        /tmp/up/0b77c9d6763686264d44dfd926c8cb4f2f03a43a.diff \
        /tmp/up/6067ce6d2b82ce2e80055e578fdfd7ba3e67c523.diff \
        /tmp/up/af6c5dea43fc63452d49b2339e607365b6652987.diff \
        /tmp/up/a08fe8ca609b651c8d8c8379780aad508d492421.diff
    Skipped patch 'doc/Changelog'.
    Checking patch util/netevent.c...
    Applied patch util/netevent.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch config.h.in...
    Hunk #1 succeeded at 83 (offset -3 lines).
    Hunk #2 succeeded at 167 (offset -3 lines).
    Checking patch configure...
    Hunk #1 succeeded at 19010 (offset -3 lines).
    Checking patch configure.ac...
    Hunk #1 succeeded at 1197 (offset -3 lines).
    Checking patch util/ub_event.c...
    Applied patch config.h.in cleanly.
    Applied patch configure cleanly.
    Applied patch configure.ac cleanly.
    Applied patch util/ub_event.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch services/listen_dnsport.c...
    Applied patch services/listen_dnsport.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch services/listen_dnsport.c...
    Hunk #1 succeeded at 1779 (offset -7 lines).
    Hunk #2 succeeded at 1857 (offset -7 lines).
    Applied patch services/listen_dnsport.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch services/listen_dnsport.c...
    Hunk #1 succeeded at 1746 (offset -6 lines).
    Checking patch services/mesh.c...
    Applied patch services/listen_dnsport.c cleanly.
    Applied patch services/mesh.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch daemon/worker.c...
    Hunk #1 succeeded at 770 (offset -2 lines).
    Checking patch util/netevent.c...
    Hunk #1 succeeded at 1551 (offset -16 lines).
    Hunk #2 succeeded at 1617 (offset -16 lines).
    Applied patch daemon/worker.c cleanly.
    Applied patch util/netevent.c cleanly.
    Skipped patch 'doc/Changelog'.
    Checking patch services/mesh.c...
    Hunk #1 succeeded at 1196 (offset 4 lines).
    Applied patch services/mesh.c cleanly.

The changelog entries corresponding to these commits are as follows:

[348cbab016f824a336b65d0091310fe5cd58e762]
   - Fix to reinit event structure for accepted TCP (and TLS) sockets.

[2b47ca080eb91e209fb86cd1dc90a6aff32e2a1f]
   - Fix to use event_assign with libevent for thread-safety.

[cae8361dcd2809c8e266d259370c9ab8660c2c0e]
   - Fix #4225: clients seem to erroneously receive no answer with
     DNS-over-TLS and qname-minimisation.

[0b77c9d6763686264d44dfd926c8cb4f2f03a43a]
   - Fix that spoolbuf is not used to store tcp pipelined response
     between mesh send and callback end.

[6067ce6d2b82ce2e80055e578fdfd7ba3e67c523]
   - Fix that fixes the Fix that spoolbuf is not used to store tcp
     pipelined response between mesh send and callback end, this fixes
     error cases that did not use the correct spoolbuf.

[af6c5dea43fc63452d49b2339e607365b6652987]
   - Fix another spoolbuf storage code point, in prefetch.

[a08fe8ca609b651c8d8c8379780aad508d492421]
   - Attempt to fix malformed tcp response.
---
 config.h.in               |  7 +++++++
 configure                 | 29 +++++++++++++++++++++++++++++
 configure.ac              |  8 ++++++++
 daemon/worker.c           |  8 +++++++-
 services/listen_dnsport.c | 10 +++++++++-
 services/mesh.c           | 18 +++++++++++++-----
 util/netevent.c           | 10 ++++++++--
 util/ub_event.c           | 14 ++++++++++++++
 8 files changed, 95 insertions(+), 9 deletions(-)

diff --git a/config.h.in b/config.h.in
index 545536d..c835aaf 100644
--- a/config.h.in
+++ b/config.h.in
@@ -83,6 +83,10 @@
    if you don't. */
 #undef HAVE_DECL_ARC4RANDOM_UNIFORM
 
+/* Define to 1 if you have the declaration of `evsignal_assign', and to 0 if
+   you don't. */
+#undef HAVE_DECL_EVSIGNAL_ASSIGN
+
 /* Define to 1 if you have the declaration of `inet_ntop', and to 0 if you
    don't. */
 #undef HAVE_DECL_INET_NTOP
@@ -163,6 +167,9 @@
 /* Define to 1 if you have the `ERR_load_crypto_strings' function. */
 #undef HAVE_ERR_LOAD_CRYPTO_STRINGS
 
+/* Define to 1 if you have the `event_assign' function. */
+#undef HAVE_EVENT_ASSIGN
+
 /* Define to 1 if you have the `event_base_free' function. */
 #undef HAVE_EVENT_BASE_FREE
 
diff --git a/configure b/configure
index 6d86111..c14c64f 100755
--- a/configure
+++ b/configure
@@ -19010,6 +19010,35 @@ _ACEOF
 fi
 done
  # only in libev. (tested on 4.00)
+	for ac_func in event_assign
+do :
+  ac_fn_c_check_func "$LINENO" "event_assign" "ac_cv_func_event_assign"
+if test "x$ac_cv_func_event_assign" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_EVENT_ASSIGN 1
+_ACEOF
+
+fi
+done
+ # in libevent, for thread-safety
+	ac_fn_c_check_decl "$LINENO" "evsignal_assign" "ac_cv_have_decl_evsignal_assign" "$ac_includes_default
+#ifdef HAVE_EVENT_H
+#  include <event.h>
+#else
+#  include \"event2/event.h\"
+#endif
+
+"
+if test "x$ac_cv_have_decl_evsignal_assign" = xyes; then :
+  ac_have_decl=1
+else
+  ac_have_decl=0
+fi
+
+cat >>confdefs.h <<_ACEOF
+#define HAVE_DECL_EVSIGNAL_ASSIGN $ac_have_decl
+_ACEOF
+
         PC_LIBEVENT_DEPENDENCY="libevent"
 
 	if test -n "$BAK_LDFLAGS_SET"; then
diff --git a/configure.ac b/configure.ac
index e071dc3..5a47ce9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1197,6 +1197,14 @@ large outgoing port ranges.  ])
 	AC_CHECK_FUNCS([event_base_get_method]) # only in libevent 1.4.3 and later
 	AC_CHECK_FUNCS([ev_loop]) # only in libev. (tested on 3.51)
 	AC_CHECK_FUNCS([ev_default_loop]) # only in libev. (tested on 4.00)
+	AC_CHECK_FUNCS([event_assign]) # in libevent, for thread-safety
+	AC_CHECK_DECLS([evsignal_assign], [], [], [AC_INCLUDES_DEFAULT
+#ifdef HAVE_EVENT_H
+#  include <event.h>
+#else
+#  include "event2/event.h"
+#endif
+	])
         PC_LIBEVENT_DEPENDENCY="libevent"
         AC_SUBST(PC_LIBEVENT_DEPENDENCY)
 	if test -n "$BAK_LDFLAGS_SET"; then
diff --git a/daemon/worker.c b/daemon/worker.c
index c9504dd..e0e8448 100644
--- a/daemon/worker.c
+++ b/daemon/worker.c
@@ -770,8 +770,14 @@ reply_and_prefetch(struct worker* worker, struct query_info* qinfo,
 {
 	/* first send answer to client to keep its latency 
 	 * as small as a cachereply */
-	if(sldns_buffer_limit(repinfo->c->buffer) != 0)
+	if(sldns_buffer_limit(repinfo->c->buffer) != 0) {
+		if(repinfo->c->tcp_req_info) {
+			sldns_buffer_copy(
+				repinfo->c->tcp_req_info->spool_buffer,
+				repinfo->c->buffer);
+		}
 		comm_point_send_reply(repinfo);
+	}
 	server_stats_prefetch(&worker->stats, worker);
 	
 	/* create the prefetch in the mesh as a normal lookup without
diff --git a/services/listen_dnsport.c b/services/listen_dnsport.c
index f86a83d..4e6be35 100644
--- a/services/listen_dnsport.c
+++ b/services/listen_dnsport.c
@@ -1746,6 +1746,7 @@ tcp_req_info_handle_readdone(struct tcp_req_info* req)
 	req->is_drop = 0;
 	req->is_reply = 0;
 	req->in_worker_handle = 1;
+	sldns_buffer_set_limit(req->spool_buffer, 0);
 	/* handle the current request */
 	/* this calls the worker handle request routine that could give
 	 * a cache response, or localdata response, or drop the reply,
@@ -1857,7 +1858,14 @@ void
 tcp_req_info_send_reply(struct tcp_req_info* req)
 {
 	if(req->in_worker_handle) {
-		/* It is in the right buffer to answer straight away */
+		/* reply from mesh is in the spool_buffer */
+		/* copy now, so that the spool buffer is free for other tasks
+		 * before the callback is done */
+		sldns_buffer_clear(req->cp->buffer);
+		sldns_buffer_write(req->cp->buffer,
+			sldns_buffer_begin(req->spool_buffer),
+			sldns_buffer_limit(req->spool_buffer));
+		sldns_buffer_flip(req->cp->buffer);
 		req->is_reply = 1;
 		return;
 	}
diff --git a/services/mesh.c b/services/mesh.c
index bee0f76..1b7e104 100644
--- a/services/mesh.c
+++ b/services/mesh.c
@@ -354,6 +354,10 @@ void mesh_new_client(struct mesh_area* mesh, struct query_info* qinfo,
 	int was_detached = 0;
 	int was_noreply = 0;
 	int added = 0;
+	struct sldns_buffer* r_buffer = rep->c->buffer;
+	if(rep->c->tcp_req_info) {
+		r_buffer = rep->c->tcp_req_info->spool_buffer;
+	}
 	if(!unique)
 		s = mesh_area_find(mesh, cinfo, qinfo, qflags&(BIT_RD|BIT_CD), 0, 0);
 	/* does this create a new reply state? */
@@ -389,7 +393,7 @@ void mesh_new_client(struct mesh_area* mesh, struct query_info* qinfo,
 			if(!inplace_cb_reply_servfail_call(mesh->env, qinfo, NULL, NULL,
 				LDNS_RCODE_SERVFAIL, edns, rep, mesh->env->scratch))
 					edns->opt_list = NULL;
-			error_encode(rep->c->buffer, LDNS_RCODE_SERVFAIL,
+			error_encode(r_buffer, LDNS_RCODE_SERVFAIL,
 				qinfo, qid, qflags, edns);
 			comm_point_send_reply(rep);
 			return;
@@ -405,7 +409,7 @@ void mesh_new_client(struct mesh_area* mesh, struct query_info* qinfo,
 				if(!inplace_cb_reply_servfail_call(mesh->env, qinfo, NULL,
 					NULL, LDNS_RCODE_SERVFAIL, edns, rep, mesh->env->scratch))
 						edns->opt_list = NULL;
-				error_encode(rep->c->buffer, LDNS_RCODE_SERVFAIL,
+				error_encode(r_buffer, LDNS_RCODE_SERVFAIL,
 					qinfo, qid, qflags, edns);
 				comm_point_send_reply(rep);
 				return;
@@ -434,7 +438,7 @@ void mesh_new_client(struct mesh_area* mesh, struct query_info* qinfo,
 			if(!inplace_cb_reply_servfail_call(mesh->env, qinfo, &s->s,
 				NULL, LDNS_RCODE_SERVFAIL, edns, rep, mesh->env->scratch))
 					edns->opt_list = NULL;
-			error_encode(rep->c->buffer, LDNS_RCODE_SERVFAIL,
+			error_encode(r_buffer, LDNS_RCODE_SERVFAIL,
 				qinfo, qid, qflags, edns);
 			comm_point_send_reply(rep);
 			if(added)
@@ -1192,12 +1196,16 @@ void mesh_query_done(struct mesh_state* mstate)
 			comm_point_drop_reply(&r->query_reply);
 		else {
 			struct sldns_buffer* r_buffer = r->query_reply.c->buffer;
-			if(r->query_reply.c->tcp_req_info)
+			if(r->query_reply.c->tcp_req_info) {
 				r_buffer = r->query_reply.c->tcp_req_info->spool_buffer;
+				prev_buffer = NULL;
+			}
 			mesh_send_reply(mstate, mstate->s.return_rcode, rep,
 				r, r_buffer, prev, prev_buffer);
-			if(r->query_reply.c->tcp_req_info)
+			if(r->query_reply.c->tcp_req_info) {
 				tcp_req_info_remove_mesh_state(r->query_reply.c->tcp_req_info, mstate);
+				r_buffer = NULL;
+			}
 			prev = r;
 			prev_buffer = r_buffer;
 		}
diff --git a/util/netevent.c b/util/netevent.c
index a507faf..97742f4 100644
--- a/util/netevent.c
+++ b/util/netevent.c
@@ -926,6 +926,14 @@ comm_point_tcp_accept_callback(int fd, short event, void* arg)
 	}
 	/* accept incoming connection. */
 	c_hdl = c->tcp_free;
+	/* clear leftover flags from previous use, and then set the
+	 * correct event base for the event structure for libevent */
+	ub_event_free(c_hdl->ev->ev);
+	c_hdl->ev->ev = ub_event_new(c_hdl->ev->base->eb->base, -1, UB_EV_PERSIST | UB_EV_READ | UB_EV_TIMEOUT, comm_point_tcp_handle_callback, c_hdl);
+	if(!c_hdl->ev->ev) {
+		log_warn("could not ub_event_new, dropped tcp");
+		return;
+	}
 	log_assert(fd != -1);
 	(void)fd;
 	new_fd = comm_point_perform_accept(c, &c_hdl->repinfo.addr,
@@ -1543,7 +1551,6 @@ comm_point_tcp_handle_write(int fd, struct comm_point* c)
 		iov[1].iov_base = sldns_buffer_begin(buffer);
 		iov[1].iov_len = sldns_buffer_limit(buffer);
 		log_assert(iov[0].iov_len > 0);
-		log_assert(iov[1].iov_len > 0);
 		msg.msg_name = &c->repinfo.addr;
 		msg.msg_namelen = c->repinfo.addrlen;
 		msg.msg_iov = iov;
@@ -1610,7 +1617,6 @@ comm_point_tcp_handle_write(int fd, struct comm_point* c)
 		iov[1].iov_base = sldns_buffer_begin(buffer);
 		iov[1].iov_len = sldns_buffer_limit(buffer);
 		log_assert(iov[0].iov_len > 0);
-		log_assert(iov[1].iov_len > 0);
 		r = writev(fd, iov, 2);
 #else /* HAVE_WRITEV */
 		r = send(fd, (void*)(((uint8_t*)&len)+c->tcp_byte_count),
diff --git a/util/ub_event.c b/util/ub_event.c
index 78481a9..e097fbc 100644
--- a/util/ub_event.c
+++ b/util/ub_event.c
@@ -295,11 +295,18 @@ ub_event_new(struct ub_event_base* base, int fd, short bits,
 	if (!ev)
 		return NULL;
 
+#ifndef HAVE_EVENT_ASSIGN
 	event_set(ev, fd, NATIVE_BITS(bits), NATIVE_BITS_CB(cb), arg);
 	if (event_base_set(AS_EVENT_BASE(base), ev) != 0) {
 		free(ev);
 		return NULL;
 	}
+#else
+	if (event_assign(ev, AS_EVENT_BASE(base), fd, bits, cb, arg) != 0) {
+		free(ev);
+		return NULL;
+	}
+#endif
 	return AS_UB_EVENT(ev);
 }
 
@@ -312,11 +319,18 @@ ub_signal_new(struct ub_event_base* base, int fd,
 	if (!ev)
 		return NULL;
 
+#if !HAVE_DECL_EVSIGNAL_ASSIGN
 	signal_set(ev, fd, NATIVE_BITS_CB(cb), arg);
 	if (event_base_set(AS_EVENT_BASE(base), ev) != 0) {
 		free(ev);
 		return NULL;
 	}
+#else
+	if (evsignal_assign(ev, AS_EVENT_BASE(base), fd, cb, arg) != 0) {
+		free(ev);
+		return NULL;
+	}
+#endif
 	return AS_UB_EVENT(ev);
 }
 
-- 
2.28.0

Reply via email to