As a belated follow-up (I was away at the time), note that in general we don't tamper with code we have ported from other projects as it makes future maintenance so much more difficult. At the very least, we need conspicuous comments to ensure that such changes do not get lost (I've just added one).

On Tue, 15 Jul 2008, Martin Maechler wrote:

"BD" == Bill Dunlap <[EMAIL PROTECTED]>
    on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:

   BD> There is a 2-block memory leak in the sub() (or any other regex-related
   BD> function, probably) when the pattern argument involves a range
   BD> expression, e.g., '[0-9]'.

   BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
   BD> ==14519== Memcheck, a memory error detector.
   BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
   BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary 
translation.
   BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
   BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation 
framework.
   BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
   BD> ==14519== For more details, rerun with: -v
   BD> ==14519==

   BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
   BD> ...
   >> for(i in 1:1000)sub("[a-c]","+","0abcd")
   >> q()
   BD> ==32503==
   BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
   BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
   BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
   BD> allocated.
   BD> ==32503== For counts of detected errors, rerun with: -v
   BD> ==32503== searching for pointers to 7,915 not-freed blocks.
   BD> ==32503== checked 12,616,568 bytes.
   BD> ==32503==
   BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
   BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
   BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
   BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
   BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
   BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
   BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
   BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
   BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
   BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
   BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
   BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
   BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
   BD> ==32503==
   BD> ... ignore 85 byte/4 block leak in readline ...
   BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 
36 of
   BD> 45
   BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
   BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
   BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
   BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
   BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
   BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
   BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
   BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
   BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
   BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
   BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
   BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)

   BD> The leaked blocks are allocated in iinternal_function build_range_exp() 
at
   BD> 5200             /* Use realloc since mbcset->range_starts and mbcset-> 
range_ends
   BD> 5201                are NULL if *range_alloc == 0.  */
   BD> 5202             new_array_start = re_realloc (mbcset->range_starts,
   BD> wchar_t,
   BD> 5203                                           new_nranges);
   BD> 5204             new_array_end = re_realloc (mbcset->range_ends, wchar_t,
   BD> 5205                                         new_nranges);
   BD> ...
   BD> 5210             mbcset->range_starts = new_array_start;
   BD> 5211             mbcset->range_ends = new_array_end;

   BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's

((note that these were not

   BD> but range_starts and range_ends are defined and appear to be used
   BD> whether or not _LIBC is defined.  However, they are only freed if _LIBC
   BD> is defined.  In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
   BD> they don't get freed.

Ok; this all makes sense; I've seen the same in the source

Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the
memory leak; I've not checked if it's because _LIBC is defined
or for another reason.

I'm applying your patch ---  thank you, Bill.
Martin

   BD> After the following change in free_charset() only the 85 byte/4 block
   BD> leak in readline remains.

   BD> Index: regex.c
   BD> ===================================================================
   BD> --- regex.c     (revision 46046)
   BD> +++ regex.c     (working copy)
   BD> @@ -6240,9 +6240,9 @@
   BD> # ifdef _LIBC
   BD> re_free (cset->coll_syms);
   BD> re_free (cset->equiv_classes);
   BD> +# endif
   BD> re_free (cset->range_starts);
   BD> re_free (cset->range_ends);
   BD> -# endif
   BD> re_free (cset->char_classes);
   BD> re_free (cset);
   BD> }

   BD> [This report may be a duplicate: I tried submitting it via the form in
   BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]

neither do I.
The machine running the repository had a (announce by Peter
Dalgaard) downtime a couple of days ago, so this may be related.


   BD> 
----------------------------------------------------------------------------
   BD> Bill Dunlap
   BD> Insightful Corporation
   BD> bill at insightful dot com

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to