On Tue, Nov 15, 2011 at 8:47 AM, <dhi...@sonic.net> wrote: > dhi...@sonic.net wrote: >> Martin Morgan <mtmor...@fhcrc.org> wrote: >> > Allocating many small objects triggers numerous garbage collections as R >> > grows its memory, seriously degrading performance. The specific use case >> > is in creating a STRSXP of several 1,000,000's of elements of 60-100 >> > characters each; a simplified illustration understating the effects >> > (because there is initially little to garbage collect, in contrast to an >> > R session with several packages loaded) is below. > >> What a coincidence -- I was just going to post a question about why it >> is so slow to create a STRSXP of ~10,000,000 unique elements, each ~10 >> characters long. I had noticed that this seemed to show much worse >> than linear scaling. I had not thought of garbage collection as the >> culprit -- but indeed it is. By manipulating the GC trigger, I can >> make this operation take as little as 3 seconds (with no GC) or as >> long as 76 seconds (with 31 garbage collections). > > I had done some google searches on this issue, since it seemed like it > should not be too uncommon, but the only other hit I could come up > with was a thread from 2006: > > https://stat.ethz.ch/pipermail/r-devel/2006-November/043446.html > > In any case, one issue with your suggested workaround is that it > requires knowing how much additional storage is needed, which may be > an expensive operation to determine. I've just tried implementing a > different approach, which is to define two new functions to either > disable or enable GC. The function to disable GC first invokes > R_gc_full() to shrink the heap as much as possible, then sets a flag. > Then in R_gc_internal(), I first check that flag, and if it is set, I > call AdjustHeapSize(size_needed) and exit immediately. > > These calls could be used to bracket any code section that expects to > make lots of calls to R's memory allocator. The down side is that > this approach requires that all paths out of such a code section > (including error handling) need to take care to unset the GC-disabled > flag. I think I would want to hear from someone on the R team about > whether they think this is a good idea.
If .Call and .C re-enabled the GC on return from compiled code (and threw some sort of error) that would help contain the potential damage. You'd might also want to re-enable GC if malloc() returned NULL, rather than giving an out-of-memory error. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel