On Sep 30, 2014, at 5:20 PM, Matthieu Gomez <gomez.matth...@gmail.com> wrote:
> 
> I have a question about shallow copies in R. Since R 3.1.0, subsetting a 
> dataframe with respect to its columns no longer result in deep copies. This 
> is an amazing change in my opinion. Now, subsetting a data.frame by rows (or 
> subsetting a matrix by columns or rows) still does deep copies. In 
> particular, it is my understanding that running a command on a very large 
> subset of rows (say "sum" or "biglm" on non outliers observations) results in 
> a deep copy of these rows first, which can require twice as much the memory 
> of the original data.frame/matrix. If this is correct, I would be very 
> interested to know more on whether this behavior can/may change in future 
> versions of R.
> 

No. Subsetting a vector always requires a copy by definition*. Each column in a 
dataframe and each matrix is a vector, so any subset thereof always requires a 
copy no matter what you do.
Subsetting columns of a dataframe only requires a copy of the dataframe vector 
itself which is small by comparison (at least for datasets that use data 
frames).

Cheers,
Simon

* - you could try to do tricks where you fake a copy with things like COW 
mmaps, but you still need to have a copy conceptually. There are other tricks 
like deferred execution (you don't actually compute the result but only store 
the recipe for creating it), but those are more specialized and not generally 
available.
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to