I was given a reliable reproducer for the sentinel segfault that seems
to be present since at least Redis 4. I can only reproduce on amd64 and
only when compiling with -O1 or -O2, but not with -O0.

>From what I can tell, it is an out-of-bounds access trying to read from
a page without read permissions, hence the process is killed. It's
always the same line 2216 in sentinel.c:

#0  0x000003dc47ed6750 in sentinelRefreshInstanceInfo (ri=0x3de5f0ac008, 
info=<optimized out>)
    at sentinel.c:2216
#1  0x000003dc47f17a1f in __redisRunCallback (ac=<optimized out>, cb=<optimized 
out>,
    reply=<optimized out>) at async.c:255
#2  redisProcessCallbacks (ac=0x3df2e288e00) at async.c:478
#3  0x000003dc47f18097 in redisAsyncHandleRead (ac=0x3df2e288e00) at async.c:549
#4  0x000003dc47e52c68 in aeProcessEvents (eventLoop=<optimized out>, flags=27) 
at ae.c:479
#5  0x000003dc47e5300d in aeMain (eventLoop=0x3deb34c6c88) at ae.c:539
#6  0x000003dc47e601ec in main (argc=<optimized out>, argv=0x7f7ffffc6c28) at 
server.c:5173

2212         /* master_link_down_since_seconds:<seconds> */
2213         if (sdslen(l) >= 32 &&
2214             !memcmp(l,"master_link_down_since_seconds",30))
2215         {
2216             ri->master_link_down_time = strtoll(l+31,NULL,10)*1000;
2217         }

Inspecting the pointer l at the crash shows:

(gdb) x /32xb l-3
0x3df3123aff8:  0x00    0x00    0x01    0x00    0xdf    0xdf    0xdf    0xdf
0x3df3123b000:  Cannot access memory at address 0x3df3123b000

since l[-1] is 1, sdslen() returns the byte stored at l-3 - see sds.h.
You can see that l itself is an empty string followed by df's, then
memory that isn't readable on the next page. The four df's are db when
using MALLOC_OPTIONS=J and 00 with j, so it's unitialized memory that
was previously freed. As sdslen(l) works out to 0, line 2216 should not
be hit...

I get similar results using printf debugging, so I don't think the
debugger lies to me.

As a last data point, when running the sentinel in the debugger, I
always saw three threads starting right before the crash.

Here's the reproducer:

$ cat > sentinel.conf
logfile ""
maxclients 100
port 26379
daemonize no
pidfile "/tmp/redis-sentinel.pid"
dir "/tmp"
sentinel monitor mymaster 127.0.0.1 6379 1
sentinel myid 8ae1a55e5cd1228bb4f29951d799e59937357542
EOF
$ redis-server --daemonize yes
$ redis-sentinel sentinel.conf

This will lead to a segfault of the sentinel after a few minutes (it
happens sometimes after a few seconds and usually well before 10 minutes
on my laptop).

With the patch below that special cases the compilation of sentinel.c
to use -O0, I can no longer reproduce the crash. The sentinel remained
up for many hours several times.

If someone with stronger make or debugging skills than mine has
improvements or even a fix to offer, I'd be keen.  The build starts with
a 'make clean', so a pre-build target in the port's Makefile probably
won't work without additional patches in the Makefile either.

Index: Makefile
===================================================================
RCS file: /var/cvs/ports/databases/redis/Makefile,v
retrieving revision 1.113
diff -u -p -r1.113 Makefile
--- Makefile    14 Jun 2020 07:35:36 -0000      1.113
+++ Makefile    23 Jun 2020 13:36:54 -0000
@@ -4,6 +4,7 @@ COMMENT =               persistent key-value database
 DISTNAME =             redis-6.0.5
 CATEGORIES =           databases
 HOMEPAGE =             https://redis.io/
+REVISION =             0
 
 # BSD
 PERMIT_PACKAGE =       Yes
Index: patches/patch-src_Makefile
===================================================================
RCS file: /var/cvs/ports/databases/redis/patches/patch-src_Makefile,v
retrieving revision 1.32
diff -u -p -r1.32 patch-src_Makefile
--- patches/patch-src_Makefile  8 Jun 2020 07:32:47 -0000       1.32
+++ patches/patch-src_Makefile  24 Jun 2020 03:26:11 -0000
@@ -1,5 +1,12 @@
 $OpenBSD: patch-src_Makefile,v 1.32 2020/06/08 07:32:47 tb Exp $
 
+Changes in this file:
+- install redis-server and redis-sentinel into ${PREFIX}/sbin
+- don't use jemalloc and bundled lua; use lua from ports.
+- do not use -funwind-tables and -latomic on armv7
+- compile sentinel.c with -O0 to avoid a redis-sentinel segfault on amd64
+- run tests with datasize, fds, stacksize and processes at the hard limit
+
 Index: src/Makefile
 --- src/Makefile.orig
 +++ src/Makefile
@@ -65,7 +72,17 @@ Index: src/Makefile
  
  # redis-sentinel
  $(REDIS_SENTINEL_NAME): $(REDIS_SERVER_NAME)
-@@ -315,7 +318,7 @@ distclean: clean
+@@ -302,6 +305,9 @@ DEP = $(REDIS_SERVER_OBJ:%.o=%.d) $(REDIS_CLI_OBJ:%.o=
+ %.o: %.c .make-prerequisites
+       $(REDIS_CC) -MMD -o $@ -c $<
+ 
++sentinel.o: sentinel.c .make-prerequisites
++      $(REDIS_CC) -O0 -MMD -o $@ -c $<
++
+ clean:
+       rm -rf $(REDIS_SERVER_NAME) $(REDIS_SENTINEL_NAME) $(REDIS_CLI_NAME) 
$(REDIS_BENCHMARK_NAME) $(REDIS_CHECK_RDB_NAME) $(REDIS_CHECK_AOF_NAME) *.o 
*.gcda *.gcno *.gcov redis.info lcov-html Makefile.dep dict-benchmark
+       rm -f $(DEP)
+@@ -315,7 +321,7 @@ distclean: clean
  .PHONY: distclean
  
  test: $(REDIS_SERVER_NAME) $(REDIS_CHECK_AOF_NAME)
@@ -74,7 +91,7 @@ Index: src/Makefile
  
  test-sentinel: $(REDIS_SENTINEL_NAME)
        @(cd ..; ./runtest-sentinel)
-@@ -359,13 +362,14 @@ src/help.h:
+@@ -359,13 +365,14 @@ src/help.h:
        @../utils/generate-command-help.rb > help.h
  
  install: all

Reply via email to