There is no plan to change R's garbage collector, and I did not say there was. What I wrote is:
If R is built to use reference counting for determining sharing information this does not happen, so this is likely to change and not force a copy by 3.4.0. So reference counting is to be used for determining sharing, _not_ for memory management. There is some work in progress to allow alternate representation for R vectors that would for the most part behave like standard vectors. There are however a lot of thorny issues: while it is nice if passing such things to sum() or mean() behaves in the 'usual' way, it is probably not so nice if passing to log() or to serialize() behaves in the 'usual' way. We'll have to see over the next few month whether these issues can be addressed in a reusable way. Best, luke On Sat, 6 Aug 2016, frede...@ofb.net wrote:
Dear R Devel, In a thread this morning Luke Tierney mentioned that R's way of garbage collecting is going to change soon in 3.4.0. I couldn't find this info on Google but I wanted to share what I had been discussing in another forum, in case now is not too late to raise considerations which could affect the design of planned changes to R's garbage collection facilities. I ran into a problem when trying to get R to quickly load some vectors from disk. R should be able to do this efficiently using memory mapping. There is a package 'ff' which implements efficient loading of disk-based vectors using memory mapping. It works pretty well, but the problem is that it creates a separate data type - the vectors are not "native" R vectors. There are some wrapper functions in a package 'ffbase' which allow people to use common functions like 'sum' on these 'ff' vectors. However, a new wrapper has to be written for every such function, and I guess the 'ffbase' authors do not have time to write wrappers that are as efficient as the native R functions - in my testing, there was a 10x slow-down for 'sum'. The situation is a bit wistful because an 'ff' vector and a native R vector are basically the same data type, they both store elements contiguously in memory. Apparently, what prevents 'ffbase' and 'ff' from creating native R vectors is the fact that it is impossible to assign a "finalizer" to a native R vector. We need a finalizer so that R can tell us when a vector is being freed, so we can unmap the associated memory/file. Ffbase maintainer Edwin de Jonge was even skeptical that CRAN would accept a package implementing the hack I had proposed to simulate native R vectors from mmap'ed 'ff' vectors. The issue is discussed here: https://github.com/edwindj/ffbase/issues/52 Of course, weak references and external pointers allow finalizers to be assigned to objects, but as I understand it, such objects are
i> separate types from vectors - there is no way in R to synthesize a
native vector endowed with a finalizer - something which could be passed directly to built-in functions like 'sum'. I think a finalizer facility for vectors would be useful because it would allow us to take advantage of the memory mapping architecture present in all modern processors, to do fast copy-free operations on large disk-based data structures, without having to re-implement internal functions like 'sum' which are essentially the same algorithm no matter where the data is stored. Thank you, Frederick
-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel