Hi there,
Problem :: When one tries to change one or some of the columns of a data.frame, R makes a copy of the whole data.frame using the '*tmp*' mechanism (this does not happen for components of a list, tracemem( ) on R-2.6.2 says so). Suggested solution :: Store the columns of the data.frame as a list inside of an environment slot of an S4 class, and define the '[', '[<-' etc. operators using setMethod( ) and setReplaceMethod( ). Question :: This implementation will violate copy on modify principle of R (since environments are not copied), but will save a lot of memory. Do you see any other obvious problem(s) with the idea? Have you seen a related setup implemented / considered before (apart from the packages like filehash, ff, and database related ones for saving memory)? Implementation code snippet :: ### The S4 class. setClass('DataFrame', representation(data = 'data.frame', nrow = 'numeric', ncol = 'numeric', store = 'environment'), prototype(data = data.frame( ), nrow = 0, ncol = 0)) setMethod('initialize', 'DataFrame', function(.Object) { .Object <- callNextMethod( ) [EMAIL PROTECTED] <- new.env(hash = TRUE) assign('data', as.list([EMAIL PROTECTED]), [EMAIL PROTECTED]) [EMAIL PROTECTED] <- nrow([EMAIL PROTECTED]) [EMAIL PROTECTED] <- ncol([EMAIL PROTECTED]) [EMAIL PROTECTED] <- data.frame( ) .Object }) ### Usage: nn <- 10 ## dd1 below could possibly be created by read.table or scan and data.frame dd1 <- data.frame(xx = rnorm(nn), yy = rnorm(nn)) dd2 <- new('DataFrame', data = dd1) rm(dd1) ## Now work with dd2 Thanks a lot, Gopi Goswami. PhD, Statistics, 2005 http://gopi-goswami.net/index.html [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel