[Bug 2023263] Re: nrpe crash in free() upon connection tear down

Matthew Ruffell Thu, 08 May 2025 20:55:37 -0700

** Description changed:

- [ Impact ]
+ [Impact]
  
- A customer faced nrpe crash with Focal.
- 
- Also they provided crashdump which has below.
- 
- #3  0x00007fbc5e7d82fc in malloc_printerr (str=str@entry=0x7fbc5e8f844d
- "corrupted size vs. prev_size") at malloc.c:5347
- 
- #3  0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1
- "free(): invalid pointer") at malloc.c:5347
- 
- And I found out there are commits addressed this issue.
- 
- From 09c5d40ad50c56ac62a07ff1987fdf456d144756 Mon Sep 17 00:00:00 2001
- From: Andreas Baumann <[email protected]>
- Date: Sat, 8 Feb 2020 10:14:24 +0100
- Subject: [PATCH 2/2] read_packages( SSL ): - buff_ptr[bytes_read] = 0 results
-  in Invalid write of size 1 - tot_bytes was calculated wrongly (as rc=0 in
-  last call of SSL_read) (this created all kind of errors like "malloc():
-  invalid size (unsorted)", "corrupted size vs. prev_size" or segfaults in
-  printf of the message buffer)
- 
- Signed-off-by: Andreas Baumann <[email protected]>
- 
- ubuntu@node-17-nrpe-crash:~/src-nrpe$ cat fix_buffer_length_calc.patch
- From 6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79 Mon Sep 17 00:00:00 2001
- From: madlohe <[email protected]>
- Date: Mon, 2 Mar 2020 14:46:49 -0600
- Subject: [PATCH] Fix buffer_length calculation in nrpe.c
- 
- I made test pkg and let the customer use it. then I confirmed those
- commits fix the issue.
- 
- [ Test Plan ]
- 
- No reproducer but the customer helped to test it and provided crashdump.
- 
- [ Where problems could occur ]
- 
- as it is related to memory handling, crash by handling memory would
- still be there.
- 
- [ Other Info ]
- 
- [ Original Description ]
- On a few servers running Ubuntu 20.04 (nagios-nrpe-server 4.0.0-2ubuntu1), 
nrpe server child processes handling incoming connections crash regularly after 
having answered:
- 
- Jun  7 06:19:08 mail systemd-coredump[1691494]: Process 1691482 (nrpe) of 
user 124 dumped core.
- Jun  7 06:24:08 mail systemd-coredump[1692283]: Process 1692270 (nrpe) of 
user 124 dumped core.
- Jun  7 06:29:08 mail systemd-coredump[1693515]: Process 1693508 (nrpe) of 
user 124 dumped core.
- Jun  7 06:34:07 mail systemd-coredump[1695835]: Process 1695827 (nrpe) of 
user 124 dumped core.
- Jun  7 06:39:08 mail systemd-coredump[1696613]: Process 1696598 (nrpe) of 
user 124 dumped core.
- Jun  7 06:44:07 mail systemd-coredump[1697147]: Process 1697142 (nrpe) of 
user 124 dumped core.
- Jun  7 06:49:08 mail systemd-coredump[1697698]: Process 1697693 (nrpe) of 
user 124 dumped core.
- Jun  7 06:54:08 mail systemd-coredump[1698230]: Process 1698225 (nrpe) of 
user 124 dumped core.
- Jun  7 06:59:07 mail systemd-coredump[1698693]: Process 1698688 (nrpe) of 
user 124 dumped core.
- Jun  7 07:04:07 mail systemd-coredump[1699270]: Process 1699265 (nrpe) of 
user 124 dumped core.
- Jun  7 07:09:07 mail systemd-coredump[1699828]: Process 1699823 (nrpe) of 
user 124 dumped core.
- Jun  7 07:14:07 mail systemd-coredump[1700349]: Process 1700344 (nrpe) of 
user 124 dumped core.
- 
- Here every 5 minutes any time a client queries a particular check (not
- all checks cause the error, I guess it's about the size of the request
- packet and having to do with memory alignment / allocation granularity)
- 
- Crash is in:
+ nrpe crashes when checks are run over incoming packets, due to buffer sizes
+ being calculated incorrectly, leading to issues for freeing pointers of
+ wrong sizes.
  
  (gdb) bt
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
- #1  0x00007f1048170859 in __GI_abort () at abort.c:79
- #2  0x00007f10481db26e in __libc_message (action=action@entry=do_abort, 
fmt=fmt@entry=0x7f1048305298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
- #3  0x00007f10481e32fc in malloc_printerr (str=str@entry=0x7f1048307600 
"free(): invalid next size (fast)") at malloc.c:5347
- #4  0x00007f10481e4bac in _int_free (av=0x7f104833ab80 <main_arena>, 
p=0x561342c9c750, have_lock=0) at malloc.c:4249
- #5  0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952
- #6  0x0000561342bb6a4c in wait_for_connections () at ./nrpe.c:1441
- #7  0x0000561342bb6b33 in run_src () at ./nrpe.c:642
- #8  0x0000561342bb16e5 in main (argc=<optimized out>, argv=<optimized out>) 
at ./nrpe.c:224
+ #1  0x00007fa2a21d6859 in __GI_abort () at abort.c:79
+ #2  0x00007fa2a224126e in __libc_message (action=action@entry=do_abort, 
fmt=fmt@entry=0x7fa2a236b298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
+ #3  0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1 
"free(): invalid pointer") at malloc.c:5347
+ #4  0x00007fa2a224ab2c in _int_free (av=<optimized out>, p=<optimized out>, 
have_lock=0) at malloc.c:4173
+ #5  0x00007fa2a26c4130 in ssl_clear_hash_ctx (hash=hash@entry=0x5576d4304ad8) 
at ../ssl/ssl_lib.c:4513
+ #6  0x00007fa2a26c4162 in clear_ciphers (s=s@entry=0x5576d4304650) at 
../ssl/ssl_lib.c:574
+ #7  0x00007fa2a26c44d1 in SSL_free (s=0x5576d4304650) at ../ssl/ssl_lib.c:1184
+ #8  SSL_free (s=0x5576d4304650) at ../ssl/ssl_lib.c:1146
+ #9  0x00005576d2e1e255 in handle_connection (sock=6) at ./nrpe.c:1947
+ #10 0x00005576d2e1ea4c in wait_for_connections () at ./nrpe.c:1441
+ #11 0x00005576d2e1eb33 in run_src () at ./nrpe.c:642
+ #12 0x00005576d2e196e5 in main (argc=<optimized out>, argv=<optimized out>) 
at ./nrpe.c:224
+ 
+ We have seen a few different reasons, all:
+ 
+ #3 0x00007fbc5e7d82fc in malloc_printerr (str=str@entry=0x7fbc5e8f844d 
"corrupted size vs. prev_size") at malloc.c:5347
+ or
+ #3 0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1 
"free(): invalid pointer") at malloc.c:5347
+ or
+ #3 0x00007f10481e32fc in malloc_printerr (str=str@entry=0x7f1048307600 
"free(): invalid next size (fast)") at malloc.c:5347
+ 
+ with the same backtrace. The original reporter ran it through valgrind:
+ 
  (gdb) frame 5
- #5  0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952
- 1952                    free(v3_send_packet);
+ #5 0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952
+ 1952 free(v3_send_packet);
  (gdb) list
- 1947                    SSL_free(ssl);
- 1948            }
- 1949    #endif
+ 1947 SSL_free(ssl);
+ 1948 }
+ 1949 #endif
  1950
- 1951            if (v3_send_packet)
- 1952                    free(v3_send_packet);
+ 1951 if (v3_send_packet)
+ 1952 free(v3_send_packet);
  1953
- 1954            /* log info */
- 1955            if (debug == TRUE)
- 1956                    logit(LOG_DEBUG, "Return Code: %d, Output: %s", 
result, send_buff);
- 
- valgrind says:
+ 1954 /* log info */
+ 1955 if (debug == TRUE)
+ 1956 logit(LOG_DEBUG, "Return Code: %d, Output: %s", result, send_buff);
  
  ==1762444== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
  ==1762443== Invalid write of size 1
- ==1762443==    at 0x483F0BE: strcpy (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
- ==1762443==    by 0x112205: strcpy (string_fortified.h:90)
- ==1762443==    by 0x112205: handle_connection (nrpe.c:1927)
- ==1762443==    by 0x112A4B: wait_for_connections (nrpe.c:1441)
- ==1762443==    by 0x112B32: run_src (nrpe.c:642)
- ==1762443==    by 0x10D6E4: main (nrpe.c:224)
- ==1762443==  Address 0x4ee47a8 is 0 bytes after a block of size 88 alloc'd
- ==1762443==    at 0x483DD99: calloc (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
- ==1762443==    by 0x1121C2: handle_connection (nrpe.c:1919)
- ==1762443==    by 0x112A4B: wait_for_connections (nrpe.c:1441)
- ==1762443==    by 0x112B32: run_src (nrpe.c:642)
- ==1762443==    by 0x10D6E4: main (nrpe.c:224)
+ ==1762443== at 0x483F0BE: strcpy (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
+ ==1762443== by 0x112205: strcpy (string_fortified.h:90)
+ ==1762443== by 0x112205: handle_connection (nrpe.c:1927)
+ ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441)
+ ==1762443== by 0x112B32: run_src (nrpe.c:642)
+ ==1762443== by 0x10D6E4: main (nrpe.c:224)
+ ==1762443== Address 0x4ee47a8 is 0 bytes after a block of size 88 alloc'd
+ ==1762443== at 0x483DD99: calloc (in 
/usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
+ ==1762443== by 0x1121C2: handle_connection (nrpe.c:1919)
+ ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441)
+ ==1762443== by 0x112B32: run_src (nrpe.c:642)
+ ==1762443== by 0x10D6E4: main (nrpe.c:224)
+  
+ [Test case]
  
- And indeed the code doesn't account for a NUL delimiter when calloc()ing
- some buffer.
+ So far we have not been able to create a synthetic reproducer for this, it
+ likely involves workload specific nagios plugins being executed to gather
+ metrics.
  
- The bug has been fixed upstreams:
+ We have tested the changes in production, with good results. nrpe no longer
+ crashes, when it used to crash every 5 minutes or so.
  
- 
https://github.com/NagiosEnterprises/nrpe/commit/6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79
+ Test packages are available in the following ppa:
  
- And that applies cleanly on nagios-nrpe_4.0.0-2ubuntu1 source package
- and appears to make the problem go away.
+ https://launchpad.net/~mruffell/+archive/ubuntu/sf401685-updates
+ 
+ If you install the test package, crashes no longer occur.
+ 
+ For verification, we will run the -proposed packages in production to verify
+ that it fixes the issue.
+ 
+ [Where problems could occur]
+ 
+ We are changing the length of buffers to fix off-by-one errors. If any other
+ part of the code still requires the faulty length, we may end up just making 
it
+ crash elsewhere, moving the problem to a different place.
+ 
+ If a regression were to occur, it would likely involve further crashes that 
are
+ freeing incorrect pointers / segmentation faults / etc. There would likely be
+ no workaround, other than downgrading the package.
+ 
+ [Other info]
+ 
+ Upstream issue:
+ https://github.com/NagiosEnterprises/nrpe/issues/227
+ https://github.com/NagiosEnterprises/nrpe/pull/228
+ 
+ This has been fixed in 4.0.2 by:
+ 
+ commit 6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79
+ From: madlohe <[email protected]>
+ Date: Mon, 2 Mar 2020 14:46:49 -0600
+ Subject: Fix buffer_length calculation in nrpe.c
+ Link: 
https://github.com/NagiosEnterprises/nrpe/commit/6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79
+ 
+ commit 09c5d40ad50c56ac62a07ff1987fdf456d144756
+ From: Andreas Baumann <[email protected]>
+ Date: Sat, 8 Feb 2020 10:14:24 +0100
+ Subject: read_packages( SSL ): - buff_ptr[bytes_read] = 0 results in
+  Invalid write of size 1 - tot_bytes was calculated wrongly (as rc=0 in last
+  call of SSL_read) (this created all kind of errors like "malloc(): invalid
+  size (unsorted)", "corrupted size vs. prev_size" or segfaults in printf of
+  the message buffer)
+ Link: 
https://github.com/NagiosEnterprises/nrpe/commit/09c5d40ad50c56ac62a07ff1987fdf456d144756
+ 
+ Only focal needs the fix.
+ 
+ It seems we need "read_packages( SSL ): - buff_ptr[bytes_read] = 0 results in"
+ due to it improving the situation but not quite fixing the issue, which
+ "Fix buffer_length calculation in nrpe.c" solves completely. This is discussed
+ in https://github.com/NagiosEnterprises/nrpe/issues/227#issuecomment-593876303


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2023263

Title:
  nrpe crash in free() upon connection tear down

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nagios-nrpe/+bug/2023263/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2023263] Re: nrpe crash in free() upon connection tear down

Reply via email to