There is a 2-block memory leak in the sub() (or any other regex-related
function, probably) when the pattern argument involves a range
expression, e.g., '[0-9]'.

% R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
==14519== Memcheck, a memory error detector.
==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==14519== For more details, rerun with: -v
==14519==

R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
...
> for(i in 1:1000)sub("[a-c]","+","0abcd")
> q()
==32503==
==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
allocated.
==32503== For counts of detected errors, rerun with: -v
==32503== searching for pointers to 7,915 not-freed blocks.
==32503== checked 12,616,568 bytes.
==32503==
==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
==32503==    by 0x80A614F: parse_branch (regex.c:4707)
==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503==    by 0x8110CB4: do_gsub (character.c:1355)
==32503==    by 0x80654A4: do_internal (names.c:1135)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8160DA7: do_begin (eval.c:1174)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
==32503==
... ignore 85 byte/4 block leak in readline ...
==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
45
==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
==32503==    by 0x80A614F: parse_branch (regex.c:4707)
==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503==    by 0x8110CB4: do_gsub (character.c:1355)
==32503==    by 0x80654A4: do_internal (names.c:1135)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8160DA7: do_begin (eval.c:1174)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)

The leaked blocks are allocated in iinternal_function build_range_exp() at
   5200             /* Use realloc since mbcset->range_starts and
mbcset->range_ends
   5201                are NULL if *range_alloc == 0.  */
   5202             new_array_start = re_realloc (mbcset->range_starts,
wchar_t,
   5203                                           new_nranges);
   5204             new_array_end = re_realloc (mbcset->range_ends, wchar_t,
   5205                                         new_nranges);
...
   5210             mbcset->range_starts = new_array_start;
   5211             mbcset->range_ends = new_array_end;

This file, src/main/regex.c, contains a complicated mess of #ifdef's
but range_starts and range_ends are defined and appear to be used
whether or not _LIBC is defined.  However, they are only freed if _LIBC
is defined.  In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
they don't get freed.

After the following change in free_charset() only the 85 byte/4 block
leak in readline remains.

Index: regex.c
===================================================================
--- regex.c     (revision 46046)
+++ regex.c     (working copy)
@@ -6240,9 +6240,9 @@
 # ifdef _LIBC
   re_free (cset->coll_syms);
   re_free (cset->equiv_classes);
+# endif
   re_free (cset->range_starts);
   re_free (cset->range_ends);
-# endif
   re_free (cset->char_classes);
   re_free (cset);
 }

[This report may be a duplicate: I tried submitting it via the form in
http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to