There is a 2-block memory leak in the sub() (or any other regex-related function, probably) when the pattern argument involves a range expression, e.g., '[0-9]'.
% R --debugger=valgrind --debugger-args=--leak-check=full --vanilla ==14519== Memcheck, a memory error detector. ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==14519== Using LibVEX rev 1658, a library for dynamic binary translation. ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework. ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==14519== For more details, rerun with: -v ==14519== R version 2.8.0 Under development (unstable) (2008-07-07 r46046) ... > for(i in 1:1000)sub("[a-c]","+","0abcd") > q() ==32503== ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2) ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks. ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes allocated. ==32503== For counts of detected errors, rerun with: -v ==32503== searching for pointers to 7,915 not-freed blocks. ==32503== checked 12,616,568 bytes. ==32503== ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45 ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) ==32503== by 0x80A5F92: parse_expression (regex.c:5202) ==32503== by 0x80A614F: parse_branch (regex.c:4707) ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) ==32503== by 0x8110CB4: do_gsub (character.c:1355) ==32503== by 0x80654A4: do_internal (names.c:1135) ==32503== by 0x815F0EB: Rf_eval (eval.c:461) ==32503== by 0x8160DA7: do_begin (eval.c:1174) ==32503== by 0x815F0EB: Rf_eval (eval.c:461) ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) ==32503== ... ignore 85 byte/4 block leak in readline ... ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of 45 ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) ==32503== by 0x80A5F92: parse_expression (regex.c:5202) ==32503== by 0x80A614F: parse_branch (regex.c:4707) ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) ==32503== by 0x8110CB4: do_gsub (character.c:1355) ==32503== by 0x80654A4: do_internal (names.c:1135) ==32503== by 0x815F0EB: Rf_eval (eval.c:461) ==32503== by 0x8160DA7: do_begin (eval.c:1174) ==32503== by 0x815F0EB: Rf_eval (eval.c:461) ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) The leaked blocks are allocated in iinternal_function build_range_exp() at 5200 /* Use realloc since mbcset->range_starts and mbcset->range_ends 5201 are NULL if *range_alloc == 0. */ 5202 new_array_start = re_realloc (mbcset->range_starts, wchar_t, 5203 new_nranges); 5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t, 5205 new_nranges); ... 5210 mbcset->range_starts = new_array_start; 5211 mbcset->range_ends = new_array_end; This file, src/main/regex.c, contains a complicated mess of #ifdef's but range_starts and range_ends are defined and appear to be used whether or not _LIBC is defined. However, they are only freed if _LIBC is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so they don't get freed. After the following change in free_charset() only the 85 byte/4 block leak in readline remains. Index: regex.c =================================================================== --- regex.c (revision 46046) +++ regex.c (working copy) @@ -6240,9 +6240,9 @@ # ifdef _LIBC re_free (cset->coll_syms); re_free (cset->equiv_classes); +# endif re_free (cset->range_starts); re_free (cset->range_ends); -# endif re_free (cset->char_classes); re_free (cset); } [This report may be a duplicate: I tried submitting it via the form in http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.] ---------------------------------------------------------------------------- Bill Dunlap Insightful Corporation bill at insightful dot com "All statements in this message represent the opinions of the author and do not necessarily reflect Insightful Corporation policy or position." ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel