Dear R Devel, In a thread this morning Luke Tierney mentioned that R's way of garbage collecting is going to change soon in 3.4.0. I couldn't find this info on Google but I wanted to share what I had been discussing in another forum, in case now is not too late to raise considerations which could affect the design of planned changes to R's garbage collection facilities.
I ran into a problem when trying to get R to quickly load some vectors from disk. R should be able to do this efficiently using memory mapping. There is a package 'ff' which implements efficient loading of disk-based vectors using memory mapping. It works pretty well, but the problem is that it creates a separate data type - the vectors are not "native" R vectors. There are some wrapper functions in a package 'ffbase' which allow people to use common functions like 'sum' on these 'ff' vectors. However, a new wrapper has to be written for every such function, and I guess the 'ffbase' authors do not have time to write wrappers that are as efficient as the native R functions - in my testing, there was a 10x slow-down for 'sum'. The situation is a bit wistful because an 'ff' vector and a native R vector are basically the same data type, they both store elements contiguously in memory. Apparently, what prevents 'ffbase' and 'ff' from creating native R vectors is the fact that it is impossible to assign a "finalizer" to a native R vector. We need a finalizer so that R can tell us when a vector is being freed, so we can unmap the associated memory/file. Ffbase maintainer Edwin de Jonge was even skeptical that CRAN would accept a package implementing the hack I had proposed to simulate native R vectors from mmap'ed 'ff' vectors. The issue is discussed here: https://github.com/edwindj/ffbase/issues/52 Of course, weak references and external pointers allow finalizers to be assigned to objects, but as I understand it, such objects are separate types from vectors - there is no way in R to synthesize a native vector endowed with a finalizer - something which could be passed directly to built-in functions like 'sum'. I think a finalizer facility for vectors would be useful because it would allow us to take advantage of the memory mapping architecture present in all modern processors, to do fast copy-free operations on large disk-based data structures, without having to re-implement internal functions like 'sum' which are essentially the same algorithm no matter where the data is stored. Thank you, Frederick ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel