Re: [Rd] R 3.0.0 memory use

luke-tierney Sun, 14 Apr 2013 19:14:32 -0700

There were a couple of bug fixes to somewhat obscure compound
assignment related bugs that required bumping up internal reference
counts. It's possible that one or more of these are responsible. If so
it is unavoidable for now, but it's worth finding out for sure. With
some stripped down test examples it should be possible to identify
when things changed. I won't have time to look for some time, but if
someone else wanted to nail this down that would be useful.


Best,

luke

On Sun, 14 Apr 2013, Tim Hesterberg wrote:

I did some benchmarking of data frame code, and
it appears that R 3.0.0 is far worse than earlier versions of R
in terms of how many large objects it allocates space for,
for data frame operations - creation, subscripting, subscript replacement.
For a data frame with n rows, it makes either 2 or 4 extra copies of
all of:
       8n bytes (e.g. double precision)
       24n bytes
       32n bytes
E.g., for as.data.frame(numeric vector), instead of allocations
totalling ~8n bytes, it allocates 33 times that much.

Here, compare columns 3 and 5
(columns 2 and 4 are with the dataframe package).

# Summary
#                               R-2.14.2        R-2.15.3        R-3.0.0
#                               w/o     with    w/o     with    w/o
#       as.data.frame(y)        3       1       1       1       5;4;4
#       data.frame(y)           7       3       4       2       6;2;2
#       data.frame(y, z)        7 each  3 each  4       2       8;4;4
#       as.data.frame(l)        8       3       5       2       9;4;4
#       data.frame(l)           13      5       8       3       12;4;4
#       d$z <- z                3,2     1,1     3,1     2,1     7;4;4,1
#       d[["z"]] <- z           4,3     1,1     3,1     2,1     7;4;4,1
#       d[, "z"] <- z           6,4,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- z             6,5,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- list(z=z)     6,3,2   2,2,1   4,2,2   3,2,1   8;4;4,2,2
#       d["z"] <- Z #list(z=z)  6,2,2   2,1,1   4,1,2   3,1,1   8;4;4,1,2
#       a <- d["y"]             2       1       2       1       6;4;4
#       a <- d[, "y", drop=F]   2       1       2       1       6;4;4

# Where two numbers are given, they refer to:
#   (copies of the old data frame),
#   (copies of the new column)
# A third number refers to numbers of
#   (copies made of an integer vector of row names)

# For R 3.0.0, I'm getting astounding results - many more copies,
# and also some copies of larger objects; in addition to the data
# vectors of size 80K and 160K, also 240K and 320K.
# Where three numbers are given in form a;c;d, they refer to
#   (copies of 80K; 240K; 320K)

The benchmarks are at
http://www.timhesterberg.net/r-packages/memory.R

I'm using versions of R I installed from source on a Linux box, using e.g.
./configure --prefix=(my path) --enable-memory-profiling --with-readline=no
make
make install

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:   [email protected]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 3.0.0 memory use

Reply via email to