Re: [Rd] .Call and to reclaim the memory by allocVector

2007-08-25 Thread Yongchao Ge
Dear Prof. Ripley

I am using 32bit Ubuntu 7.04 on Dual Core Intel Xeon Processor 5140. I do 
not think that it is the OS's problem in recognizing the memory released by 
free(), as the Calloc() and Free() pair works perfectly well in my 
C program. I'm assuming that the free() in your post does not mean
the standard C library function, but the Free() in the R extension, as 
recommended to release the memory back to the OS by the R extension 
manual.

It was not the 150MB that bothers me. I used the toy example to 
isolate the problem. My actual program needs to allocate around 660M bytes 
(maybe more, depending on the actual dataset) for a return from .Call. 
This return object is stored in R and will be used by many other 
functions, which also uses .Call to wrap the C code. I found that my 
program reaches the memory limit (3G) very quickly even though at most 
1.8G bytes of data should be in the memory in the C and R codes combined 
(potentially two copies of the same R object and a copy in the C 
program). The memory problem in .Call means that my program 
can run once or twice, and it fails the third time. I need to run the 
same program more than twice.

Why am I storing a large dataset in the R? My program consist of two 
parts. The first part is to get the intermediate results, the computation 
of which takes a lot of time. The second part contains many 
different functions to manipulate the the intermediate 
results.

My current solution is to save intermediate result in a temporary file, 
but my final goal is to to save it as an R object. The "memory leak" in 
.Call stops me from doing this and I'd like to know if I can have a clean 
solution for the R package I am writing.

Yongchao

On Fri, 24 Aug 2007, Prof Brian Ripley wrote:

> Please do not post to multiple lists! I've removed R-help.
>
> You have not told us your OS ('linux', perhaps but what CPU), nor how you 
> know 'the memory was still not reclaimed back to the operating system'. But 
> that is how many OSes work: their malloc maintains a pool of memory pages, 
> and free() does not return the memory to the OS kernel, just to the process' 
> pool.  It depends on what you meant by 'the operating system'.
>
> Why does this bother you?  150Mb of virtual memory is nothing these days.
>
>
> On Thu, 23 Aug 2007, Yongchao Ge wrote:
>
>> Hi,
>> 
>> I am not sure if this is a bug and I apologize if it is something I
>> didn't read carefully in the R extension manual. My initial search on the
>> R help and R devel list archive didn't find useful information.
>
> Exactly this topic was thrashed to death under the misleading title of 
> 'Suspected memory leak' earlier this month in a thread that started on R-help 
> and moved to R-devel. See e.g.
>
> https://stat.ethz.ch/pipermail/r-devel/2007-August/046669.html
>
> from the author of the R memory allocator.
>
>
>> I am using .Call (as written in the R extension manual) for the C code
>> and have found that the .Call didn't release the memory claimed by
>> allocVector. Even after applying gc() function and removing the R object
>> created by the .Call function, the memory was still not reclaimed back to
>> the operating system.
>> 
>> Here is an example. It was modified from the convolve2 example from the R
>> extension manual. Now I am computing the crossproduct of a and b, which
>> returns a vector of size length(a)*length(b).
>> 
>> The C code is at the end of this message with the modification commented.
>> The R code is here
>> 
>> dyn.load("crossprod2.so")
>> cp <- function(a, b) .Call("crossprod2", a, b)
>> gctorture()
>> a<-1:1
>> b<-1:1000
>> gc() #i
>> 
>> c<-cp(a,b)
>> rm(c)
>> gc() #ii
>> --
>> 
>> When I run the above code in a fresh start R (version 2.5.0)
>> the gc() inforamation is below. I report the last column ("max
>> used (Mb)" ) here, which agrees the linux command "ps aux". Apparently
>> even after I removing the object "c", we still have un-reclaimed 70M bytes
>> of memory, which is approximately the memory size for the object "c".
>> 
>> If I run the command "c<-cp(a,b)" for three or four times and then remove 
>> the
>> object "c" and apply gc() function, the unclaimed memory can reach 150M
>> bytes. I tried gc(reset=TRUE), and it doesn't seem to make difference.
>> 
>> Can someone suggest what caused this problem and what the solution will
>> be?  When you reply the email, please cc to me as I am not on the help
>> list.
>> 
>> Thanks,
>> 
>> Yongchao
>> 
>> 
>>> dyn.load("crossprod2.so")
>>> cp <- function(a, b) .Call("crossprod2", a, b)
>>> gctorture()
>>> a<-1:1
>>> b<-1:1000
>> 
>> 
>>> gc() #i
>>  used (Mb) gc trigger (Mb) max used (Mb)
>> Ncells 173527  4.7 467875 12.5   35  9.4
>> Vcells 108850  0.9 786432  6.0   398019  3.1
>>> 
>>> c<-cp(a,b)
>>> rm(c)
>>> gc() #ii
>>  used (Mb) gc trigger (Mb) max used (Mb)
>> Ncells 233998  6.3 46787

Re: [Rd] .Call and to reclaim the memory by allocVector

2007-08-25 Thread Seth Falcon
Hi Yongchao,

Yongchao Ge <[EMAIL PROTECTED]> writes:
> Why am I storing a large dataset in the R? My program consist of two 
> parts. The first part is to get the intermediate results, the computation 
> of which takes a lot of time. The second part contains many 
> different functions to manipulate the the intermediate 
> results.
>
> My current solution is to save intermediate result in a temporary file, 
> but my final goal is to to save it as an R object. The "memory leak" in 
> .Call stops me from doing this and I'd like to know if I can have a clean 
> solution for the R package I am writing.

There are many examples of packages that use .Call to create large
objects.  I don't think there is a "memory leak".

One thing that may be catching you up is that because of R's
pass-by-value semantics, you may be ending up with multiple copies of
the object on the R side during some of your operations.  I would
recommend recompiling with --enable-memory-profiling and using
tracemem() to see if you can identify places where copies of your
large object are occurring.  You can also take a look at
Rprof(memory.profile=TRUE).

+ seth


-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel