A few days ago I responded to Ramiro with a suggestion that turns out to be
incorrect.

> Ramiro
> >
> > I think the problem is the loop - R doesn't release memory allocated
> inside
> > an expression until the expression completes. A for loop is an
> expression,
> > so it duplicates fit and dataset on every iteration.
>
> The above explanation is not true.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>
My apologies for providing bad advice, and many thanks to Bill for causing
me to think more deeply about the problem. This sort of repeated simulation
task is something I do alot, and I hope I understand it better now. I
struggled to find a good exposition of memory use in R in the documentation
- Section 3.3 of the "Writing R Extensions" manual, "Profiling R Code for
Memory Use" provides some hints, but still presumes alot of knowledge on
the part of the reader. If there is a better description somewhere I'd love
to hear about it.

I found the source of my earlier assertion in section 7.1 of "S
Programming" by Venables and Ripley, on managing loops: "A major issue is
that S is designed to be able to back out from uncompleted calculations, so
that the memory used in intermediate calculations is retained  until they
are committed. This applies to for, while, and repeat loops, for which none
of the steps are committed until the whole loops is completed." This book
was written in 2000, so may be out of date, as they suggest later in the
same section. In addition, this may have applied more to S-Plus engines,
rather than R.  I also might have misunderstood what Venables and Ripley
mean by "committed", it may not have anything to do with excessive memory
growth inside a loop caused by duplicating objects.

The best explanation of what is going on with memory allocation that I
found is in John Chamber's 2008 book "Software for Data Analysis:
Programming with R", in section 13.7 "Memory management for R objects". The
key point is that assigning something to a named object, like fit in
Ramiro's example, results in a new copy of fit. The reference to the old
version of fit is lost, but the memory is not deallocated. That only
happens once garbage collection is triggered, which will happen
automatically during the loop. However, triggering garbage collection
frequently uses up alot of time as well. The other thing that I learned is
that R has some clever internal programming that detects this condition and
avoids the worst problems, but only under certain circumstances, like using
the double square brackets in the assignment.

It is also possible that the OS is unable to release memory that R has
given up - see 7.42 of the R FAQ. It's not clear on whether this happens on
Windows. This is what Liviu was referring to in his response, and seems a
likely candidate for the memory discrepancy in Ramiro's example.

Cheers



> >
> > > I have the following situation
> > >
> > > basic loop which calls memoryHogFunction:
> > >
> > > for i in (1:N) {
> > >    dataset <- generateDataset(i)
> > >    fit <- try( memoryHogFunction(dataset, otherParameters))
> > > }
> > >
> > > and within
> > >
> > > memoryHogFunction <- function(dataset, params){
> > >
> > >    fit <- try(nlme(someinitialValues)
> > >    ...
> > >    fit <- try(updatenlme(otherInitialValues)
> > >    ...
> > >    fit <- try(updatenlme(otherInitialValues)
> > >  ...
> > >    ret <- fit ( and other things)
> > >    return a result "ret"
> > > }
> > >
> > > The problem is that, memoryHogFunction uses a lot of memory, and at the
> > > end returns a result (which is not big) but the memory used by the
> > > computation seems to be still occupied.  The original loop continues,
> but
> > > the memory used by the program grows and grows after each call to
> > > memoryHogFunction.
> > >
> > > I have been trying to do gc() after each run in the loop, and have even
> > > done:
> > >
> > > in memoryHogFunction()
> > >  ...
> > >    ret <- fit ( and other things)
> > >    rm(list=ls()[-match("ret",ls())])
> > >    return a result "ret"
> > > }
> > >
> > > ???
> > >
> > > A typical results from gc() after each loop iteration says:
> > >      used (Mb) gc trigger (Mb) max used (Mb)
> > > Ncells  326953 17.5     597831 32.0   597831 32.0
> > > Vcells 1645892 12.6    3048985 23.3  3048985 23.3
> > >
> > > Which doesn't reflect that 340mb (and 400+mb in virtual memory) that
> are
> > > being used right now.
> > >
> > > Even when I do:
> > >
> > > print(sapply(ls(all.names=TRUE), function(x) object.size(get(x))))
> > >
> > > the largest object is 8179808, which is what it should be.
> > >
> > > THe only thing that looked suspicious was the following within Rprof
> (with
> > > memory=stats option), the tot.duplications might be a problem???:
> > >
> > > index: "with":"with.default"
> > >     vsize.small  max.vsize.small      vsize.large  max.vsize.large
> > >           30841            63378            20642           660787
> > >           nodes        max.nodes     duplications tot.duplications
> > >         3446132          8115016            12395         61431787
> > >         samples
> > >            4956
> > >
> > > Any suggestions?  Is it something about the use of loops in R?  Is it
> > > maybe the try's???
> > >
> > > Thanks in advance for any help,
> > >
> > > Ramiro
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Drew Tyre
> >
> > School of Natural Resources
> > University of Nebraska-Lincoln
> > 416 Hardin Hall, East Campus
> > 3310 Holdrege Street
> > Lincoln, NE 68583-0974
> >
> > phone: +1 402 472 4054
> > fax: +1 402 472 2946
> > email: aty...@unl.edu
> > http://snr.unl.edu/tyre
> > http://aminpractice.blogspot.com
> > http://www.flickr.com/photos/atiretoo
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054
fax: +1 402 472 2946
email: aty...@unl.edu
http://snr.unl.edu/tyre
http://aminpractice.blogspot.com
http://www.flickr.com/photos/atiretoo

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to