Henrik, thanks for your reply. I might have misrepresented a bit my actual code . It seems that you are suggesting doing rm() on objects i don't use. In the real code which behavior i reported it is exactly what is being done, i.e i use rm(). I also use a small wrapper around load that lets me assign loaded data directly into any a variable with any name, without remembering the name of the object from which it was saved, i.e instead of standard load i use something like (with error checking in real code)
ut.load(filename)<-function(filename){ load(filename); s<-ls(); get("obj") } in other words, after i called data[[j]]<-ut.load(file[j]), there is no reference to intermediary object to clean, i am assuming garbage collector quickly takes care of it. Just making sure that we are on the same page. I am mostly looking for some guidance on what to expect in terms of R memory behavior. This particular task is just an illustration of a typical issue that i encounter often lately. Is there a way to diagnose if everything is normal with a particular task in terms of memory use? Is there a memory benchmark? Is there some white paper discussing how memory and copying of objects actually works in R? Is there a limited chunk of C code that i could read to try to understand it? I just don't want to read all of the C code. Thanks much Andre On Wed, Apr 11, 2012 at 9:02 PM, Henrik Bengtsson <h...@biostat.ucsf.edu>wrote: > Leaving aside what's going on inside abind::abind(), maybe the > following sheds some light on what's is being wasted: > > # Preallocate (probably doesn't make a difference because it's a list) > mat.data <- vector("list", length=length(files)); > for (j in 1:length(files)){ > vars <- load(file.path(dump.dir, files[j])) > mat.data[[j]]<-data; > # Not needed anymore/remove everything loaded > rm(list=vars); > } > > data <- abind(mat.data, along=2); > # Not needed anymore > rm(mat.data); > > save(data, file.path(dump.dir, filename)) > > My $.02 > /Henrik > > On Wed, Apr 11, 2012 at 3:53 PM, andre zege <andre.z...@gmail.com> wrote: > > I recently started using R 2.14.0 on a new machine and i am experiencing > > what seems like unusually greedy memory use. It happens all the time, but > > to give a specific example, let's say i run the following code > > > > -------- > > > > for(j in 1:length(files)){ > > load(file.path(dump.dir, files[j])) > > mat.data[[j]]<-data > > } > > save(abind(mat.data, along=2), file.path(dump.dir, filename)) > > > > --------- > > > > It loads parts of multidimensional matrix into a list, then binds it > along > > second dimension and saves on disk. Code works, although slowly, but > what's > > strange is the amount of memory it uses. > > In particular, each chunk of data is between 50M to 100M, and altogether > > the binded matrix is 1.3G. One would expect that R would use roughly > double > > that memory - to keep mat.data and its binded version separately, or 1G. > I > > could imagine that for somehow it could use 3 times the size of matrix. > But > > in fact it uses more than 5.5 times (almost all of my physical memory) > and > > i think is swapping a lot to disk . For this particular task, my top > output > > shows eating more than 7G of memory and using up 11G of virtual memory as > > well > > > > $top > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 8823 user 25 0 11g 7.2g 10m R 99.7 92.9 > > 5:55.05 > > R > > > > 8590 root 15 0 154m 16m 5948 S 0.5 0.2 > > 23:22.40 Xorg > > > > > > I have strong suspicion that something is off with my R binary, i don't > > think i experienced things like that in a long time. Is this in line with > > what i am supposed to experience? Are there any ideas for diagnosing what > > is going on? > > Would appreciate any suggestions > > > > Thanks > > Andre > > > > > > ================================== > > > > Here is what i am running on: > > > > > > CentOS release 5.5 (Final) > > > > > >> sessionInfo() > > R version 2.14.0 (2011-10-31) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1 R.oo_1.9.3 > > R.methodsS3_1.2.2 > > > > loaded via a namespace (and not attached): > > [1] codetools_0.2-8 tcltk_2.14.0 tools_2.14.0 > > > > > > > > I compiled R configure as follows > > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no > > --with-tcltk --enable-R-shlib=yes > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel