As a belated follow-up (I was away at the time), note that in general we
don't tamper with code we have ported from other projects as it makes
future maintenance so much more difficult. At the very least, we need
conspicuous comments to ensure that such changes do not get lost (I've
just added one).
On Tue, 15 Jul 2008, Martin Maechler wrote:
"BD" == Bill Dunlap <[EMAIL PROTECTED]>
on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:
BD> There is a 2-block memory leak in the sub() (or any other regex-related
BD> function, probably) when the pattern argument involves a range
BD> expression, e.g., '[0-9]'.
BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
BD> ==14519== Memcheck, a memory error detector.
BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary
translation.
BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation
framework.
BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
BD> ==14519== For more details, rerun with: -v
BD> ==14519==
BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
BD> ...
>> for(i in 1:1000)sub("[a-c]","+","0abcd")
>> q()
BD> ==32503==
BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
BD> allocated.
BD> ==32503== For counts of detected errors, rerun with: -v
BD> ==32503== searching for pointers to 7,915 not-freed blocks.
BD> ==32503== checked 12,616,568 bytes.
BD> ==32503==
BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202)
BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707)
BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355)
BD> ==32503== by 0x80654A4: do_internal (names.c:1135)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
BD> ==32503==
BD> ... ignore 85 byte/4 block leak in readline ...
BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record
36 of
BD> 45
BD> ==32503== at 0x40046EE: malloc (vg_replace_malloc.c:149)
BD> ==32503== by 0x4005B9A: realloc (vg_replace_malloc.c:306)
BD> ==32503== by 0x80A5F92: parse_expression (regex.c:5202)
BD> ==32503== by 0x80A614F: parse_branch (regex.c:4707)
BD> ==32503== by 0x80A621A: parse_reg_exp (regex.c:4666)
BD> ==32503== by 0x80A6618: Rf_regcomp (regex.c:4635)
BD> ==32503== by 0x8110CB4: do_gsub (character.c:1355)
BD> ==32503== by 0x80654A4: do_internal (names.c:1135)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8160DA7: do_begin (eval.c:1174)
BD> ==32503== by 0x815F0EB: Rf_eval (eval.c:461)
BD> ==32503== by 0x8162210: Rf_applyClosure (eval.c:667)
BD> The leaked blocks are allocated in iinternal_function build_range_exp()
at
BD> 5200 /* Use realloc since mbcset->range_starts and mbcset->
range_ends
BD> 5201 are NULL if *range_alloc == 0. */
BD> 5202 new_array_start = re_realloc (mbcset->range_starts,
BD> wchar_t,
BD> 5203 new_nranges);
BD> 5204 new_array_end = re_realloc (mbcset->range_ends, wchar_t,
BD> 5205 new_nranges);
BD> ...
BD> 5210 mbcset->range_starts = new_array_start;
BD> 5211 mbcset->range_ends = new_array_end;
BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's
((note that these were not
BD> but range_starts and range_ends are defined and appear to be used
BD> whether or not _LIBC is defined. However, they are only freed if _LIBC
BD> is defined. In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
BD> they don't get freed.
Ok; this all makes sense; I've seen the same in the source
Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the
memory leak; I've not checked if it's because _LIBC is defined
or for another reason.
I'm applying your patch --- thank you, Bill.
Martin
BD> After the following change in free_charset() only the 85 byte/4 block
BD> leak in readline remains.
BD> Index: regex.c
BD> ===================================================================
BD> --- regex.c (revision 46046)
BD> +++ regex.c (working copy)
BD> @@ -6240,9 +6240,9 @@
BD> # ifdef _LIBC
BD> re_free (cset->coll_syms);
BD> re_free (cset->equiv_classes);
BD> +# endif
BD> re_free (cset->range_starts);
BD> re_free (cset->range_ends);
BD> -# endif
BD> re_free (cset->char_classes);
BD> re_free (cset);
BD> }
BD> [This report may be a duplicate: I tried submitting it via the form in
BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]
neither do I.
The machine running the repository had a (announce by Peter
Dalgaard) downtime a couple of days ago, so this may be related.
BD>
----------------------------------------------------------------------------
BD> Bill Dunlap
BD> Insightful Corporation
BD> bill at insightful dot com
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
--
Brian D. Ripley, [EMAIL PROTECTED]
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel