On Tue, Sep 30, 2014 at 2:20 PM, Matthieu Gomez <gomez.matth...@gmail.com> wrote: > I have a question about shallow copies in R. Since R 3.1.0, subsetting > a dataframe with respect to its columns no longer result in deep > copies. This is an amazing change in my opinion. Now, subsetting a > data.frame by rows (or subsetting a matrix by columns or rows) still > does deep copies. In particular, it is my understanding that running a > command on a very large subset of rows (say "sum" or "biglm" on non > outliers observations) results in a deep copy of these rows first, > which can require twice as much the memory of the original > data.frame/matrix. If this is correct, I would be very interested to > know more on whether this behavior can/may change in future versions > of R.
I let the experts comment on this, but subsetting/reshuffling columns in data.frame:s sound easy compared with what you're asking for. Columns of a data frame are basically just elements in a list and they don't have to be contiguous in memory whereas the elements in a matrix (of a basic data type) needs to be contiguous in memory. However, somewhat related: Having lots of these temporary copies can be quite time consuming for the garbage collector, so it's not just about the memory but also about the overall processing time. One of the next improvements in the 'matrixStats' package is to add support for specifying subsets of rows/columns to operate over - for the purpose of avoiding the auxiliary copies that you talk about, e.g. cols <- c(1:14, 87:103) rows <- seq(from=1, to=nrow(X), by=2) y <- rowMedians(X, rows=rows, columns=cols) instead of y <- rowMedians(X[rows,cols]) It's a fairly simple task to implement, but I've got lots on my plate so I don't know when this will happen. (I welcome contributions via github.com/HenrikBengtsson/matrixStats.) Similar methods in R (e.g. rowSums()) could gain from this too. /Henrik (matrixStats) PS. Code compilation could translate rowMedians(X[rows,cols]) to rowMedians(X, rows=rows, columns=cols) but that's far in the future (I think). > > Thanks a lot!, > Matthew > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel