You are quite right that my exec time would seriously go down if i pre-allocate and not even use abind, just assign into the preallocated matrix. The reason i didn't do it here is that this is a part of some utility function that doesn't know the sizes of chunks that are on disk untill it reads all of them. If i knew a way to read dimnames off disk without reading whole matrices, i could do what you are suggesting. I guess i am better off using filebacked matrices from bigmemory, where i could read dimnames off disk without reading the matrix. I need to unwrap 4 dim arrays into 2 dim arrays and wrap them back, but i guess it would be faster anyway.
My question, however was not so much about speed improvement of a particular task. It was whether this memory use of 7.2g physical memory and 11g of virtual makes sense when i am building a 1.3G matrix with this code. It just seems to me that my memory goes to almost 100% physical not just on this task but on others. I wonder if there is something seriously off with my memory experience and if i should rebuild R. In term of your lapply solution, it indeed used much less memory, in fact about 25% less memory than the loop, about 4 times the size of the final object. I am still not clear if my memory use makes sense in terms of R memory model and I am frankly not clear why lapply uses less memory. (I understand why it makes less copying) On Wed, Apr 11, 2012 at 7:15 PM, peter dalgaard <pda...@gmail.com> wrote: > > On Apr 12, 2012, at 00:53 , andre zege wrote: > > > I recently started using R 2.14.0 on a new machine and i am experiencing > > what seems like unusually greedy memory use. It happens all the time, but > > to give a specific example, let's say i run the following code > > > > -------- > > > > for(j in 1:length(files)){ > > load(file.path(dump.dir, files[j])) > > mat.data[[j]]<-data > > } > > save(abind(mat.data, along=2), file.path(dump.dir, filename)) > > Hmm, did you preallocate mat.data? If not, you will be copying it > repeatedly, and I'm not sure that this can be done by copying pointers only. > > Does it work better with > > mat.data <- lapply(files, function(name) {load(file.path(dump.dir, name); > data}) > > ? > > > > > > --------- > > > > It loads parts of multidimensional matrix into a list, then binds it > along > > second dimension and saves on disk. Code works, although slowly, but > what's > > strange is the amount of memory it uses. > > In particular, each chunk of data is between 50M to 100M, and altogether > > the binded matrix is 1.3G. One would expect that R would use roughly > double > > that memory - to keep mat.data and its binded version separately, or 1G. > I > > could imagine that for somehow it could use 3 times the size of matrix. > But > > in fact it uses more than 5.5 times (almost all of my physical memory) > and > > i think is swapping a lot to disk . For this particular task, my top > output > > shows eating more than 7G of memory and using up 11G of virtual memory as > > well > > > > $top > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 8823 user 25 0 11g 7.2g 10m R 99.7 92.9 > > 5:55.05 > > R > > > > 8590 root 15 0 154m 16m 5948 S 0.5 0.2 > > 23:22.40 Xorg > > > > > > I have strong suspicion that something is off with my R binary, i don't > > think i experienced things like that in a long time. Is this in line with > > what i am supposed to experience? Are there any ideas for diagnosing what > > is going on? > > Would appreciate any suggestions > > > > Thanks > > Andre > > > > > > ================================== > > > > Here is what i am running on: > > > > > > CentOS release 5.5 (Final) > > > > > >> sessionInfo() > > R version 2.14.0 (2011-10-31) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] en_US.UTF-8 > > > > attached base packages: > > [1] stats graphics grDevices datasets utils methods base > > > > other attached packages: > > [1] abind_1.4-0 rJava_0.9-3 R.utils_1.12.1 R.oo_1.9.3 > > R.methodsS3_1.2.2 > > > > loaded via a namespace (and not attached): > > [1] codetools_0.2-8 tcltk_2.14.0 tools_2.14.0 > > > > > > > > I compiled R configure as follows > > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no > > --with-tcltk --enable-R-shlib=yes > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel