>>>>> "BD" == Bill Dunlap <[EMAIL PROTECTED]> >>>>> on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:
BD> There is a 2-block memory leak in the sub() (or any other regex-related BD> function, probably) when the pattern argument involves a range BD> expression, e.g., '[0-9]'. BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla BD> ==14519== Memcheck, a memory error detector. BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary translation. BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework. BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. BD> ==14519== For more details, rerun with: -v BD> ==14519== BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046) BD> ... >> for(i in 1:1000)sub("[a-c]","+","0abcd") >> q() BD> ==32503== BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2) BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks. BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes BD> allocated. BD> ==32503== For counts of detected errors, rerun with: -v BD> ==32503== searching for pointers to 7,915 not-freed blocks. BD> ==32503== checked 12,616,568 bytes. BD> ==32503== BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45 BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202) BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707) BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355) BD> ==32503== by 0x80654A4: do_internal (names.c:1135) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) BD> ==32503== BD> ... ignore 85 byte/4 block leak in readline ... BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of BD> 45 BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149) BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306) BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202) BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707) BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666) BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635) BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355) BD> ==32503== by 0x80654A4: do_internal (names.c:1135) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174) BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461) BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667) BD> The leaked blocks are allocated in iinternal_function build_range_exp() at BD> 5200 /* Use realloc since mbcset->range_starts and mbcset-> range_ends BD> 5201 are NULL if *range_alloc == 0. */ BD> 5202 new_array_start = re_realloc (mbcset->range_starts, BD> wchar_t, BD> 5203 new_nranges); BD> 5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t, BD> 5205 new_nranges); BD> ... BD> 5210 mbcset->range_starts = new_array_start; BD> 5211 mbcset->range_ends = new_array_end; BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's ((note that these were not BD> but range_starts and range_ends are defined and appear to be used BD> whether or not _LIBC is defined. However, they are only freed if _LIBC BD> is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so BD> they don't get freed. Ok; this all makes sense; I've seen the same in the source Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the memory leak; I've not checked if it's because _LIBC is defined or for another reason. I'm applying your patch --- thank you, Bill. Martin BD> After the following change in free_charset() only the 85 byte/4 block BD> leak in readline remains. BD> Index: regex.c BD> =================================================================== BD> --- regex.c (revision 46046) BD> +++ regex.c (working copy) BD> @@ -6240,9 +6240,9 @@ BD> # ifdef _LIBC BD> re_free (cset->coll_syms); BD> re_free (cset->equiv_classes); BD> +# endif BD> re_free (cset->range_starts); BD> re_free (cset->range_ends); BD> -# endif BD> re_free (cset->char_classes); BD> re_free (cset); BD> } BD> [This report may be a duplicate: I tried submitting it via the form in BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.] neither do I. The machine running the repository had a (announce by Peter Dalgaard) downtime a couple of days ago, so this may be related. BD> ---------------------------------------------------------------------------- BD> Bill Dunlap BD> Insightful Corporation BD> bill at insightful dot com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel