Dear All,
Thanks a lot for your helpful comments (e.g., NAMED, ExpressionSet, DNAStringSet). Observations and questions :: ooo For a data.frame dd and a list ll with same contents to being with, the following operations show significant difference in the maximum memory usage column of the gc( ) output on R-2.6.2 (the detailed code is in the PS section below). ll$xx <- zz dd$xx <- zz My understanding is that the '$<-.data.frame' S3 method above makes a copy of the whole dd first (using '*tmp*'). But for a list this is avoided due to the use of SET_VECTOR_ELT at the C-level. Is this a valid explanation or something deeper is happening behind the scene? ooo I'll look into the read-only flag idea to avoid unhappy circumstances that might arise while bypassing the copy-on-modify principle. Any pointers or code snippets as to how to implement this idea? ooo The main reason I want to bypass copy-on-modify is that I want to emulate a Python like behavior for lists (and data.frame), in the sense that, I want to take the responsibility of making a deep copy if need be, but most of the time I want to knowingly change 'things in place' using the proposed S4 class DataFrame. Regards, Gopi Goswami. PhD, Statistics, 2005 http://gopi-goswami.net/index.html PS: zz <- seq_len(1000000) gc( ) dd <- data.frame(xx = zz) dd$yy <- zz gc( ) object.size(dd) ###################################################################### zz <- seq_len(1000000) gc( ) ll <- list(xx = zz) ll$yy <- zz gc( ) object.size(ll) On Mon, Apr 14, 2008 at 10:18 AM, Tony Plate <[EMAIL PROTECTED]> wrote: > Gopi Goswami wrote: > > > Hi there, > > > > > > Problem :: > > When one tries to change one or some of the columns of a data.frame, R > > makes > > a copy of the whole data.frame using the '*tmp*' mechanism (this does > > not > > happen for components of a list, tracemem( ) on R-2.6.2 says so). > > > > > > Suggested solution :: > > Store the columns of the data.frame as a list inside of an environment > > slot > > of an S4 class, and define the '[', '[<-' etc. operators using > > setMethod( ) > > and setReplaceMethod( ). > > > > > > Question :: > > This implementation will violate copy on modify principle of R (since > > environments are not copied), but will save a lot of memory. Do you see > > any > > other obvious problem(s) with the idea? > > > Well, because it violates the copy-on-modify principle it can potentially > break code that depends on this principle. I don't know how much there is > -- did you try to see if R and recommended packages will pass checks with > this change in place? > > > Have you seen a related setup > > implemented / considered before (apart from the packages like filehash, > > ff, > > and database related ones for saving memory)? > > > > > I've frequently used a personal package that stores array data in a file > (like ff). It works fine, and I partially get around the problem of > violating the copy-on-modify principle by having a readonly flag in the > object -- when the flag is set to allow modification I have to be careful, > but after I set it to readonly I can use it more freely with the knowledge > that if some function does attempt to modify the object, it will stop with > an error. > > In this particular case, why not just track down why data frame > modification is copying the entire object and suggest a change so that it > just copies the column being changed? (should be possible if list > modification doesn't copy all components). > > -- Tony Plate > > > > > Implementation code snippet :: > > ### The S4 class. > > setClass('DataFrame', > > representation(data = 'data.frame', nrow = 'numeric', ncol > > = > > 'numeric', store = 'environment'), > > prototype(data = data.frame( ), nrow = 0, ncol = 0)) > > > > setMethod('initialize', 'DataFrame', function(.Object) { > > .Object <- callNextMethod( ) > > [EMAIL PROTECTED] <- new.env(hash = TRUE) > > assign('data', as.list([EMAIL PROTECTED]), [EMAIL PROTECTED]) > > [EMAIL PROTECTED] <- nrow([EMAIL PROTECTED]) > > [EMAIL PROTECTED] <- ncol([EMAIL PROTECTED]) > > [EMAIL PROTECTED] <- data.frame( ) > > .Object > > }) > > > > > > ### Usage: > > nn <- 10 > > ## dd1 below could possibly be created by read.table or scan and > > data.frame > > dd1 <- data.frame(xx = rnorm(nn), yy = rnorm(nn)) > > dd2 <- new('DataFrame', data = dd1) > > rm(dd1) > > ## Now work with dd2 > > > > > > Thanks a lot, > > Gopi Goswami. > > PhD, Statistics, 2005 > > http://gopi-goswami.net/index.html > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel