** Description changed: - [ Impact ] + [Impact] - A customer faced nrpe crash with Focal. - - Also they provided crashdump which has below. - - #3 0x00007fbc5e7d82fc in malloc_printerr (str=str@entry=0x7fbc5e8f844d - "corrupted size vs. prev_size") at malloc.c:5347 - - #3 0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1 - "free(): invalid pointer") at malloc.c:5347 - - And I found out there are commits addressed this issue. - - From 09c5d40ad50c56ac62a07ff1987fdf456d144756 Mon Sep 17 00:00:00 2001 - From: Andreas Baumann <[email protected]> - Date: Sat, 8 Feb 2020 10:14:24 +0100 - Subject: [PATCH 2/2] read_packages( SSL ): - buff_ptr[bytes_read] = 0 results - in Invalid write of size 1 - tot_bytes was calculated wrongly (as rc=0 in - last call of SSL_read) (this created all kind of errors like "malloc(): - invalid size (unsorted)", "corrupted size vs. prev_size" or segfaults in - printf of the message buffer) - - Signed-off-by: Andreas Baumann <[email protected]> - - ubuntu@node-17-nrpe-crash:~/src-nrpe$ cat fix_buffer_length_calc.patch - From 6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79 Mon Sep 17 00:00:00 2001 - From: madlohe <[email protected]> - Date: Mon, 2 Mar 2020 14:46:49 -0600 - Subject: [PATCH] Fix buffer_length calculation in nrpe.c - - I made test pkg and let the customer use it. then I confirmed those - commits fix the issue. - - [ Test Plan ] - - No reproducer but the customer helped to test it and provided crashdump. - - [ Where problems could occur ] - - as it is related to memory handling, crash by handling memory would - still be there. - - [ Other Info ] - - [ Original Description ] - On a few servers running Ubuntu 20.04 (nagios-nrpe-server 4.0.0-2ubuntu1), nrpe server child processes handling incoming connections crash regularly after having answered: - - Jun 7 06:19:08 mail systemd-coredump[1691494]: Process 1691482 (nrpe) of user 124 dumped core. - Jun 7 06:24:08 mail systemd-coredump[1692283]: Process 1692270 (nrpe) of user 124 dumped core. - Jun 7 06:29:08 mail systemd-coredump[1693515]: Process 1693508 (nrpe) of user 124 dumped core. - Jun 7 06:34:07 mail systemd-coredump[1695835]: Process 1695827 (nrpe) of user 124 dumped core. - Jun 7 06:39:08 mail systemd-coredump[1696613]: Process 1696598 (nrpe) of user 124 dumped core. - Jun 7 06:44:07 mail systemd-coredump[1697147]: Process 1697142 (nrpe) of user 124 dumped core. - Jun 7 06:49:08 mail systemd-coredump[1697698]: Process 1697693 (nrpe) of user 124 dumped core. - Jun 7 06:54:08 mail systemd-coredump[1698230]: Process 1698225 (nrpe) of user 124 dumped core. - Jun 7 06:59:07 mail systemd-coredump[1698693]: Process 1698688 (nrpe) of user 124 dumped core. - Jun 7 07:04:07 mail systemd-coredump[1699270]: Process 1699265 (nrpe) of user 124 dumped core. - Jun 7 07:09:07 mail systemd-coredump[1699828]: Process 1699823 (nrpe) of user 124 dumped core. - Jun 7 07:14:07 mail systemd-coredump[1700349]: Process 1700344 (nrpe) of user 124 dumped core. - - Here every 5 minutes any time a client queries a particular check (not - all checks cause the error, I guess it's about the size of the request - packet and having to do with memory alignment / allocation granularity) - - Crash is in: + nrpe crashes when checks are run over incoming packets, due to buffer sizes + being calculated incorrectly, leading to issues for freeing pointers of + wrong sizes. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 - #1 0x00007f1048170859 in __GI_abort () at abort.c:79 - #2 0x00007f10481db26e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f1048305298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 - #3 0x00007f10481e32fc in malloc_printerr (str=str@entry=0x7f1048307600 "free(): invalid next size (fast)") at malloc.c:5347 - #4 0x00007f10481e4bac in _int_free (av=0x7f104833ab80 <main_arena>, p=0x561342c9c750, have_lock=0) at malloc.c:4249 - #5 0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952 - #6 0x0000561342bb6a4c in wait_for_connections () at ./nrpe.c:1441 - #7 0x0000561342bb6b33 in run_src () at ./nrpe.c:642 - #8 0x0000561342bb16e5 in main (argc=<optimized out>, argv=<optimized out>) at ./nrpe.c:224 + #1 0x00007fa2a21d6859 in __GI_abort () at abort.c:79 + #2 0x00007fa2a224126e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fa2a236b298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 + #3 0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1 "free(): invalid pointer") at malloc.c:5347 + #4 0x00007fa2a224ab2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173 + #5 0x00007fa2a26c4130 in ssl_clear_hash_ctx (hash=hash@entry=0x5576d4304ad8) at ../ssl/ssl_lib.c:4513 + #6 0x00007fa2a26c4162 in clear_ciphers (s=s@entry=0x5576d4304650) at ../ssl/ssl_lib.c:574 + #7 0x00007fa2a26c44d1 in SSL_free (s=0x5576d4304650) at ../ssl/ssl_lib.c:1184 + #8 SSL_free (s=0x5576d4304650) at ../ssl/ssl_lib.c:1146 + #9 0x00005576d2e1e255 in handle_connection (sock=6) at ./nrpe.c:1947 + #10 0x00005576d2e1ea4c in wait_for_connections () at ./nrpe.c:1441 + #11 0x00005576d2e1eb33 in run_src () at ./nrpe.c:642 + #12 0x00005576d2e196e5 in main (argc=<optimized out>, argv=<optimized out>) at ./nrpe.c:224 + + We have seen a few different reasons, all: + + #3 0x00007fbc5e7d82fc in malloc_printerr (str=str@entry=0x7fbc5e8f844d "corrupted size vs. prev_size") at malloc.c:5347 + or + #3 0x00007fa2a22492fc in malloc_printerr (str=str@entry=0x7fa2a23694c1 "free(): invalid pointer") at malloc.c:5347 + or + #3 0x00007f10481e32fc in malloc_printerr (str=str@entry=0x7f1048307600 "free(): invalid next size (fast)") at malloc.c:5347 + + with the same backtrace. The original reporter ran it through valgrind: + (gdb) frame 5 - #5 0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952 - 1952 free(v3_send_packet); + #5 0x0000561342bb6262 in handle_connection (sock=6) at ./nrpe.c:1952 + 1952 free(v3_send_packet); (gdb) list - 1947 SSL_free(ssl); - 1948 } - 1949 #endif + 1947 SSL_free(ssl); + 1948 } + 1949 #endif 1950 - 1951 if (v3_send_packet) - 1952 free(v3_send_packet); + 1951 if (v3_send_packet) + 1952 free(v3_send_packet); 1953 - 1954 /* log info */ - 1955 if (debug == TRUE) - 1956 logit(LOG_DEBUG, "Return Code: %d, Output: %s", result, send_buff); - - valgrind says: + 1954 /* log info */ + 1955 if (debug == TRUE) + 1956 logit(LOG_DEBUG, "Return Code: %d, Output: %s", result, send_buff); ==1762444== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ==1762443== Invalid write of size 1 - ==1762443== at 0x483F0BE: strcpy (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) - ==1762443== by 0x112205: strcpy (string_fortified.h:90) - ==1762443== by 0x112205: handle_connection (nrpe.c:1927) - ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441) - ==1762443== by 0x112B32: run_src (nrpe.c:642) - ==1762443== by 0x10D6E4: main (nrpe.c:224) - ==1762443== Address 0x4ee47a8 is 0 bytes after a block of size 88 alloc'd - ==1762443== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) - ==1762443== by 0x1121C2: handle_connection (nrpe.c:1919) - ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441) - ==1762443== by 0x112B32: run_src (nrpe.c:642) - ==1762443== by 0x10D6E4: main (nrpe.c:224) + ==1762443== at 0x483F0BE: strcpy (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) + ==1762443== by 0x112205: strcpy (string_fortified.h:90) + ==1762443== by 0x112205: handle_connection (nrpe.c:1927) + ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441) + ==1762443== by 0x112B32: run_src (nrpe.c:642) + ==1762443== by 0x10D6E4: main (nrpe.c:224) + ==1762443== Address 0x4ee47a8 is 0 bytes after a block of size 88 alloc'd + ==1762443== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) + ==1762443== by 0x1121C2: handle_connection (nrpe.c:1919) + ==1762443== by 0x112A4B: wait_for_connections (nrpe.c:1441) + ==1762443== by 0x112B32: run_src (nrpe.c:642) + ==1762443== by 0x10D6E4: main (nrpe.c:224) + + [Test case] - And indeed the code doesn't account for a NUL delimiter when calloc()ing - some buffer. + So far we have not been able to create a synthetic reproducer for this, it + likely involves workload specific nagios plugins being executed to gather + metrics. - The bug has been fixed upstreams: + We have tested the changes in production, with good results. nrpe no longer + crashes, when it used to crash every 5 minutes or so. - https://github.com/NagiosEnterprises/nrpe/commit/6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79 + Test packages are available in the following ppa: - And that applies cleanly on nagios-nrpe_4.0.0-2ubuntu1 source package - and appears to make the problem go away. + https://launchpad.net/~mruffell/+archive/ubuntu/sf401685-updates + + If you install the test package, crashes no longer occur. + + For verification, we will run the -proposed packages in production to verify + that it fixes the issue. + + [Where problems could occur] + + We are changing the length of buffers to fix off-by-one errors. If any other + part of the code still requires the faulty length, we may end up just making it + crash elsewhere, moving the problem to a different place. + + If a regression were to occur, it would likely involve further crashes that are + freeing incorrect pointers / segmentation faults / etc. There would likely be + no workaround, other than downgrading the package. + + [Other info] + + Upstream issue: + https://github.com/NagiosEnterprises/nrpe/issues/227 + https://github.com/NagiosEnterprises/nrpe/pull/228 + + This has been fixed in 4.0.2 by: + + commit 6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79 + From: madlohe <[email protected]> + Date: Mon, 2 Mar 2020 14:46:49 -0600 + Subject: Fix buffer_length calculation in nrpe.c + Link: https://github.com/NagiosEnterprises/nrpe/commit/6d2a1cbfc01b6fb4a32ee0151816dca91dfdec79 + + commit 09c5d40ad50c56ac62a07ff1987fdf456d144756 + From: Andreas Baumann <[email protected]> + Date: Sat, 8 Feb 2020 10:14:24 +0100 + Subject: read_packages( SSL ): - buff_ptr[bytes_read] = 0 results in + Invalid write of size 1 - tot_bytes was calculated wrongly (as rc=0 in last + call of SSL_read) (this created all kind of errors like "malloc(): invalid + size (unsorted)", "corrupted size vs. prev_size" or segfaults in printf of + the message buffer) + Link: https://github.com/NagiosEnterprises/nrpe/commit/09c5d40ad50c56ac62a07ff1987fdf456d144756 + + Only focal needs the fix. + + It seems we need "read_packages( SSL ): - buff_ptr[bytes_read] = 0 results in" + due to it improving the situation but not quite fixing the issue, which + "Fix buffer_length calculation in nrpe.c" solves completely. This is discussed + in https://github.com/NagiosEnterprises/nrpe/issues/227#issuecomment-593876303
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2023263 Title: nrpe crash in free() upon connection tear down To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/nagios-nrpe/+bug/2023263/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
