Dear R community,
I am still struggling a bit on how R does memory allocation and how to optimize my code to minimize
working memory load. Simon (thanks!) and others gave me a hint to use the command "gc()"
to clean up memory which works quite nice but appears to me to be more like a "fix" to a
problem.
To give you an impression of what I am talking, here is a short code example +
I will give rough measure (system track app) of my working memory needed for
each computational step (R64bit latest version on WIN 7 64 bit system, 2 Cores,
approx 4 GB Ram):
##########################
# example 1:
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
# used working memory increases from 1044 --> 1808 MB
# (same command again, i.e.)
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
# 1808 MB --> 2178 MB Why does memory increase?
# (give the matrix column names)
colnames(y) = c("col1", "col2")
# 2178 MB --> 1781 MB Why does the size of an object decrease if I assign
column labels?
###
# example 2:
y= matrix(rep(1,50000000), nrow = 50000000/2 , ncol = 2)
1016 --> 1780 MB
y = data.frame(y)
# increase from 1780 MB --> 3315 MB
##########################
Why does it take so much extra memory to store this matrix as a data.frame?
It is not the object per se (i.e. that data.frames need more memory) because if
I use gc() memory size drops to 1387 MB. Does this mean that it may be more
memory-efficient not to use any data.frames but matrices only? etc.
This puzzles me a lot. From my experience these effects are also accentuated
for larger objects.
As an anecdotal comparison: I also used Stata in my last project due to these
memory problems and I could do a lot of variable manipulations of the same (!)
data with significant (I am talking about GB) less memory needed.
Best,
Marc
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.