You are quite right that my exec time would seriously go down if i
pre-allocate and not even use abind, just assign into the preallocated
matrix. The reason i didn't do it here is that this is a part of some
utility function that doesn't know the sizes of chunks that are on disk
untill it reads all of them. If i knew a way to read dimnames off disk
without reading whole matrices, i could do what you are suggesting. I guess
i am better off using filebacked matrices from bigmemory, where i could
read dimnames off disk without reading the matrix. I need to unwrap 4 dim
arrays into 2 dim arrays and wrap them back, but i guess it would be faster
anyway.

My question, however was not so much about speed improvement of a
particular task. It was whether this memory use of 7.2g physical memory and
11g of virtual makes sense when i am building a 1.3G matrix with this code.
It just seems to me that my memory goes to almost 100% physical not just on
this task but on others. I wonder if there is something seriously off with
my memory experience and if i should rebuild R.

In term of your lapply solution, it indeed used much less memory, in fact
about 25% less memory than the loop, about 4 times the size of the final
object. I am still not clear if my memory use makes sense in terms of R
memory model and I am frankly not clear why lapply uses less memory. (I
understand why it makes less copying)

On Wed, Apr 11, 2012 at 7:15 PM, peter dalgaard <pda...@gmail.com> wrote:

>
> On Apr 12, 2012, at 00:53 , andre zege wrote:
>
> > I recently started using R 2.14.0 on a new machine and i am  experiencing
> > what seems like unusually greedy memory use. It happens all the time, but
> > to give a specific example, let's say i run the following code
> >
> > --------
> >
> > for(j in 1:length(files)){
> >      load(file.path(dump.dir, files[j]))
> >      mat.data[[j]]<-data
> > }
> > save(abind(mat.data, along=2), file.path(dump.dir, filename))
>
> Hmm, did you preallocate mat.data? If not, you will be copying it
> repeatedly, and I'm not sure that this can be done by copying pointers only.
>
> Does it work better with
>
> mat.data <- lapply(files, function(name) {load(file.path(dump.dir, name);
> data})
>
> ?
>
>
> >
> > ---------
> >
> > It loads parts of multidimensional matrix into a list, then binds it
> along
> > second dimension and saves on disk. Code works, although slowly, but
> what's
> > strange is the amount of memory it uses.
> > In particular, each chunk of data is between 50M to 100M, and altogether
> > the binded matrix is 1.3G. One would expect that R would use roughly
> double
> > that memory - to keep mat.data and its binded version separately, or 1G.
> I
> > could imagine that for somehow it could use 3 times the size of matrix.
> But
> > in fact it uses more than 5.5 times (almost all of my physical memory)
> and
> > i think is swapping a lot to disk . For this particular task, my top
> output
> > shows eating more than 7G of memory and using up 11G of virtual memory as
> > well
> >
> > $top
> >
> > PID    USER      PR  NI  VIRT    RES  SHR   S %CPU %MEM    TIME+  COMMAND
> > 8823  user        25   0  11g     7.2g  10m   R   99.7     92.9
> > 5:55.05
> > R
> >
> > 8590   root       15   0  154m   16m   5948  S  0.5      0.2
> > 23:22.40 Xorg
> >
> >
> > I have strong suspicion that something is off with my R binary, i don't
> > think i experienced things like that in a long time. Is this in line with
> > what i am supposed to experience? Are there any ideas for diagnosing what
> > is going on?
> > Would appreciate any suggestions
> >
> > Thanks
> > Andre
> >
> >
> > ==================================
> >
> > Here is what i am running on:
> >
> >
> > CentOS release 5.5 (Final)
> >
> >
> >> sessionInfo()
> > R version 2.14.0 (2011-10-31)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8
> >
> > attached base packages:
> > [1] stats     graphics  grDevices datasets  utils     methods   base
> >
> > other attached packages:
> > [1] abind_1.4-0       rJava_0.9-3       R.utils_1.12.1    R.oo_1.9.3
> > R.methodsS3_1.2.2
> >
> > loaded via a namespace (and not attached):
> > [1] codetools_0.2-8 tcltk_2.14.0    tools_2.14.0
> >
> >
> >
> > I compiled R configure as follows
> > /configure --prefix=/usr/local/R --enable-byte-compiled-packages=no
> > --with-tcltk --enable-R-shlib=yes
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to