Theo Buehler writes:

> I was given a reliable reproducer for the sentinel segfault that seems
> to be present since at least Redis 4. I can only reproduce on amd64 and
> only when compiling with -O1 or -O2, but not with -O0.
>
>>From what I can tell, it is an out-of-bounds access trying to read from
> a page without read permissions, hence the process is killed. It's
> always the same line 2216 in sentinel.c:

Here is a diff resolving the out-of-bounds memory access.

Thank you for the reproducer and the debug information. I was able to
also trigger a crash within 2-10 minutes.

> I get similar results using printf debugging, so I don't think the
> debugger lies to me.

As I have discovered, gdb can be deceptive with the line number.

I sprinkled printf debugging. I thought maybe a memory barrier could
help, so I added membar_sync(), but this turns out to be a red herring.

I added BEGIN and END statements to the beginning/end of the loop along
with the index (e.g., 145).

   2221         printf("NAM MASTER_LINK_DOWN BEFORE %s len: %d\n", l, 
sdslen(l));
   2222         /* master_link_down_since_seconds:<seconds> */
   2223         cond = sdslen(l) >= 32 && \
   2224             !memcmp(l,"master_link_down_since_seconds",30);
   2225         printf("NAM cond: %d\n", cond);
   2226         membar_sync();
   2227         if (cond)
   2228         {
   2229             printf("NAM MASTER_LINK_DOWN %s\n", l);
   2230             ri->master_link_down_time = strtoll(l+31,NULL,10)*1000;
   2231         }
   2232
   2233         printf("NAM ROLE BEFORE\n");
   2234         /* role:<role> */
   2235         if (!memcmp(l,"role:master",11)) role = SRI_MASTER;
   2236         else if (!memcmp(l,"role:slave",10)) role = SRI_SLAVE;
   2237
   2238         printf("NAM SRI_SLAVE BEFORE\n");
   2239         if (role == SRI_SLAVE) {
   2240             printf("NAM SRI_SLAVE AFTER\n");


NAM BEGIN 145
NAM RUN_ID BEFORE # Cluster
NAM SRI BEFORE # Cluster
NAM MASTER_LINK_DOWN BEFORE # Cluster len: 9
NAM cond: 0
NAM ROLE BEFORE
NAM SRI_SLAVE BEFORE
NAM END
NAM BEGIN 146
NAM RUN_ID BEFORE cluster_enabled:0
NAM SRI BEFORE cluster_enabled:0
NAM MASTER_LINK_DOWN BEFORE cluster_enabled:0 len: 17
NAM cond: 0
NAM ROLE BEFORE
NAM SRI_SLAVE BEFORE
NAM END
NAM BEGIN 147
NAM RUN_ID BEFORE 
NAM SRI BEFORE 
NAM MASTER_LINK_DOWN BEFORE  len: 0
NAM cond: 0
NAM ROLE BEFORE
[New thread 279273]
[New thread 593859]
[New thread 463858]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00001ba2086d67ef in sentinelRefreshInstanceInfo (ri=0x1ba4f37d4608, 
    info=<optimized out>) at sentinel.c:2233
2233    sentinel.c: No such file or directory.

What is line 2233?

   2233         printf("NAM ROLE BEFORE\n");

Before it would consistently crash "at" strtoll, as you had
reported. After adding so many printf()'s I got a lucky break and gdb
somehow reported 2233. I read the following lines and saw that the
memcmp() was accessing invalid memory for l, similarly to the gdb print
command you had with vm.malloc_conf=J.

> If someone with stronger make or debugging skills than mine has
> improvements or even a fix to offer, I'd be keen.  The build starts with
> a 'make clean', so a pre-build target in the port's Makefile probably
> won't work without additional patches in the Makefile either.

To resolve this issue, add sdslen() to check for string length before
memcmp() like everywhere else.

I got to learn about flexible array members in C in verifying your debug
report and that the debugger sometimes lies.

Feedback and tests are welcome. With the reproducer, it has been stable
for 45 minutes on an older version of this diff (with membar_sync()).

Index: Makefile
===================================================================
RCS file: /cvs/ports/databases/redis/Makefile,v
retrieving revision 1.113
diff -u -p -u -p -r1.113 Makefile
--- Makefile    14 Jun 2020 07:35:36 -0000      1.113
+++ Makefile    27 Jun 2020 04:35:55 -0000
@@ -4,6 +4,7 @@ COMMENT =               persistent key-value database
 DISTNAME =             redis-6.0.5
 CATEGORIES =           databases
 HOMEPAGE =             https://redis.io/
+REVISION =             0
 
 # BSD
 PERMIT_PACKAGE =       Yes
Index: patches/patch-src_sentinel_c
===================================================================
RCS file: patches/patch-src_sentinel_c
diff -N patches/patch-src_sentinel_c
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ patches/patch-src_sentinel_c        27 Jun 2020 04:35:55 -0000
@@ -0,0 +1,18 @@
+$OpenBSD$
+
+redis-sentinel out of bounds memory access from memcmp
+
+Index: src/sentinel.c
+--- src/sentinel.c.orig
++++ src/sentinel.c
+@@ -2217,8 +2217,8 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance
+         }
+ 
+         /* role:<role> */
+-        if (!memcmp(l,"role:master",11)) role = SRI_MASTER;
+-        else if (!memcmp(l,"role:slave",10)) role = SRI_SLAVE;
++        if (sdslen(l) >= 11 && !memcmp(l,"role:master",11)) role = SRI_MASTER;
++        else if (sdslen(l) >= 10 && !memcmp(l,"role:slave",10)) role = 
SRI_SLAVE;
+ 
+         if (role == SRI_SLAVE) {
+             /* master_host:<host> */

Reply via email to