Theo Buehler writes: > I was given a reliable reproducer for the sentinel segfault that seems > to be present since at least Redis 4. I can only reproduce on amd64 and > only when compiling with -O1 or -O2, but not with -O0. > >>From what I can tell, it is an out-of-bounds access trying to read from > a page without read permissions, hence the process is killed. It's > always the same line 2216 in sentinel.c:
Here is a diff resolving the out-of-bounds memory access. Thank you for the reproducer and the debug information. I was able to also trigger a crash within 2-10 minutes. > I get similar results using printf debugging, so I don't think the > debugger lies to me. As I have discovered, gdb can be deceptive with the line number. I sprinkled printf debugging. I thought maybe a memory barrier could help, so I added membar_sync(), but this turns out to be a red herring. I added BEGIN and END statements to the beginning/end of the loop along with the index (e.g., 145). 2221 printf("NAM MASTER_LINK_DOWN BEFORE %s len: %d\n", l, sdslen(l)); 2222 /* master_link_down_since_seconds:<seconds> */ 2223 cond = sdslen(l) >= 32 && \ 2224 !memcmp(l,"master_link_down_since_seconds",30); 2225 printf("NAM cond: %d\n", cond); 2226 membar_sync(); 2227 if (cond) 2228 { 2229 printf("NAM MASTER_LINK_DOWN %s\n", l); 2230 ri->master_link_down_time = strtoll(l+31,NULL,10)*1000; 2231 } 2232 2233 printf("NAM ROLE BEFORE\n"); 2234 /* role:<role> */ 2235 if (!memcmp(l,"role:master",11)) role = SRI_MASTER; 2236 else if (!memcmp(l,"role:slave",10)) role = SRI_SLAVE; 2237 2238 printf("NAM SRI_SLAVE BEFORE\n"); 2239 if (role == SRI_SLAVE) { 2240 printf("NAM SRI_SLAVE AFTER\n"); NAM BEGIN 145 NAM RUN_ID BEFORE # Cluster NAM SRI BEFORE # Cluster NAM MASTER_LINK_DOWN BEFORE # Cluster len: 9 NAM cond: 0 NAM ROLE BEFORE NAM SRI_SLAVE BEFORE NAM END NAM BEGIN 146 NAM RUN_ID BEFORE cluster_enabled:0 NAM SRI BEFORE cluster_enabled:0 NAM MASTER_LINK_DOWN BEFORE cluster_enabled:0 len: 17 NAM cond: 0 NAM ROLE BEFORE NAM SRI_SLAVE BEFORE NAM END NAM BEGIN 147 NAM RUN_ID BEFORE NAM SRI BEFORE NAM MASTER_LINK_DOWN BEFORE len: 0 NAM cond: 0 NAM ROLE BEFORE [New thread 279273] [New thread 593859] [New thread 463858] Thread 1 received signal SIGSEGV, Segmentation fault. 0x00001ba2086d67ef in sentinelRefreshInstanceInfo (ri=0x1ba4f37d4608, info=<optimized out>) at sentinel.c:2233 2233 sentinel.c: No such file or directory. What is line 2233? 2233 printf("NAM ROLE BEFORE\n"); Before it would consistently crash "at" strtoll, as you had reported. After adding so many printf()'s I got a lucky break and gdb somehow reported 2233. I read the following lines and saw that the memcmp() was accessing invalid memory for l, similarly to the gdb print command you had with vm.malloc_conf=J. > If someone with stronger make or debugging skills than mine has > improvements or even a fix to offer, I'd be keen. The build starts with > a 'make clean', so a pre-build target in the port's Makefile probably > won't work without additional patches in the Makefile either. To resolve this issue, add sdslen() to check for string length before memcmp() like everywhere else. I got to learn about flexible array members in C in verifying your debug report and that the debugger sometimes lies. Feedback and tests are welcome. With the reproducer, it has been stable for 45 minutes on an older version of this diff (with membar_sync()). Index: Makefile =================================================================== RCS file: /cvs/ports/databases/redis/Makefile,v retrieving revision 1.113 diff -u -p -u -p -r1.113 Makefile --- Makefile 14 Jun 2020 07:35:36 -0000 1.113 +++ Makefile 27 Jun 2020 04:35:55 -0000 @@ -4,6 +4,7 @@ COMMENT = persistent key-value database DISTNAME = redis-6.0.5 CATEGORIES = databases HOMEPAGE = https://redis.io/ +REVISION = 0 # BSD PERMIT_PACKAGE = Yes Index: patches/patch-src_sentinel_c =================================================================== RCS file: patches/patch-src_sentinel_c diff -N patches/patch-src_sentinel_c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ patches/patch-src_sentinel_c 27 Jun 2020 04:35:55 -0000 @@ -0,0 +1,18 @@ +$OpenBSD$ + +redis-sentinel out of bounds memory access from memcmp + +Index: src/sentinel.c +--- src/sentinel.c.orig ++++ src/sentinel.c +@@ -2217,8 +2217,8 @@ void sentinelRefreshInstanceInfo(sentinelRedisInstance + } + + /* role:<role> */ +- if (!memcmp(l,"role:master",11)) role = SRI_MASTER; +- else if (!memcmp(l,"role:slave",10)) role = SRI_SLAVE; ++ if (sdslen(l) >= 11 && !memcmp(l,"role:master",11)) role = SRI_MASTER; ++ else if (sdslen(l) >= 10 && !memcmp(l,"role:slave",10)) role = SRI_SLAVE; + + if (role == SRI_SLAVE) { + /* master_host:<host> */