Hi: Are you running 32-bit or 64-bit R? For memory-intensive processes like these, 64-bit R is almost a necessity. You might also look into more efficient ways to invert the matrix, especially if it has special properties that can be exploited (e.g., symmetry). More to the point, you want to compute the nonparametric MLE as efficiently as you can, since it affects everything downstream. In addition, if you're trying to do all of this in a single function, it may be better to break the job up into several functions, one for each task, with a wrapper function to put them together (i.e., modularize).
Memory problems in R often arise from repeatedly copying objects in memory while accumulating promises in a loop that do not get evaluated until the end. Forcing evaluations or performing garbage collection at judicious points can improve efficiency. Pre-allocating memory to result objects is more efficient than adding a new element to an output vector or matrix every iteration. Vectorizing where you can is critical. Since you didn't provide any code, one is left to speculate where the bottleneck(s) in your code lie(s), but here's a little example I did for someone recently that shows how much vectorization and pre-allocation of memory can make a difference: # Problem: Simulate 1000 U(0, 1) random numbers, discretize them # into a factor and generate a table. # vectorized version using cut() f <- function() { x <- runif(1000) z <- cut(x, breaks = c(-0.1, 0.1, 0.2, 0.4, 0.7, 0.9, 1), labels = 1:6) table(z) } # use ifelse(), a vectorized function, to divide into groups g <- function() { x <- runif(1000) z <- ifelse(x <= 0.1, '1', ifelse(x > 0.1 & x <= 0.2, '2', ifelse(x > 0.2 & x <= 0.4, '3', ifelse(x > 0.4 & x <= 0.7, '4', ifelse(x > 0.7 & x <= 0.9, '5', '6'))))) table(z) } # Elementwise loop with preallocation of memory h <- function() { x <- runif(1000) z <- character(1000) # <== for(i in 1:1000) { z[i] <- if(x[i] <= 0.1) '1' else if(x[i] > 0.1 && x[i] <= 0.2) '2' else if(x[i] > 0.2 && x[i] <= 0.4) '3' else if(x[i] > 0.4 && x[i] <= 0.7) '4' else if(x[i] > 0.7 && x[i] <= 0.9) '5' else '6' } table(z) } # Same as h() w/o memory preallocation h2 <- function() { x <- runif(1000) for(i in 1:1000) { z[i] <- if(x[i] <= 0.1) '1' else if(x[i] > 0.1 && x[i] <= 0.2) '2' else if(x[i] > 0.2 && x[i] <= 0.4) '3' else if(x[i] > 0.4 && x[i] <= 0.7) '4' else if(x[i] > 0.7 && x[i] <= 0.9) '5' else '6' } table(z) } # Same as h(), but initialize with an empty vector h3 <- function() { x <- runif(1000) z <- character(0) # empty vector for(i in 1:1000) { z[i] <- if(x[i] <= 0.1) '1' else if(x[i] > 0.1 && x[i] <= 0.2) '2' else if(x[i] > 0.2 && x[i] <= 0.4) '3' else if(x[i] > 0.4 && x[i] <= 0.7) '4' else if(x[i] > 0.7 && x[i] <= 0.9) '5' else '6' } table(z) } ########## Timings using the function replicate(): > system.time(replicate(1000, f())) user system elapsed 1.14 0.04 1.20 > system.time(replicate(1000, g())) user system elapsed 3.90 0.00 3.92 > system.time(replicate(1000, h())) user system elapsed 9.24 0.00 9.26 > system.time(replicate(1000, h2())) user system elapsed 15.49 0.00 15.55 > system.time(replicate(1000, h3())) user system elapsed 15.60 0.03 15.68 The vectorized version is over three times as fast as the vectorized ifelse() approach, and the vectorized ifelse() is almost three times as fast as the preallocated memory, non-vectorized approach. The h* functions are all non-vectorized, but differ in the way they initialize memory for output objects. Full preallocation of memory (h) takes about 60% as long as the non-preallocated memory versions. Initializing an empty vector is about as fast as no initialization at all. The effects of vectorization and the use of pre-allocated memory for result objects filled in a loop are clear. If you're carrying around copies of a large n x n matrix in memory over a number of iterations of a loop, you are certainly going to gobble up available memory, no matter how much you have. You can see the result in a much simpler problem above. I'd recommend that you invest some time improving the efficiency of the MLE function. Profiling tools like Rprof() is one place to start - you can find tutorial material on the web in various places on the topic (try Googling 'Profiling R functions'), as well as some past discussion in this forum. Use RSiteSearch() and/or search the mail archives for information there. HTH, Dennis On Mon, Aug 23, 2010 at 2:44 PM, Cuckovic Paik <cuckovic.p...@gmail.com>wrote: > > Dear All, > > I have an issue on memory use in R programming. > > Here is the brief story: I want to simulate the power of a nonparameteric > test and compare it with the existing tests. The basic steps are > > 1. I need to use Newton method to obtain the nonparametric MLE that > involves > the inversion of a large matrix (n-by-n matrix, it takes about less than 3 > seconds in average to get the MLE. n = sample size) > > > 2. Since the test statistic has an unknown sample distribution, the p-value > is simmulated using Monte Carlo (1000 runs). it takes about 3-4 minutes to > get an p-value. > > > 3. I need to simulate 1000 random samples and reapte steps 1 and 2 to get > the p-value for each of the simulated samples to get the power of the test. > > > Here is the question: > > It initially completes 5-6 simulations per hour, after that, the time > needed > to complete a single simulation increases exponentially. After a 24 hour > running, I only get about 15-20 simulations completed. My computer is a PC > (Pentium Dual Core CPU 2.5 GHz, RAM 6.00GB, 64-bit). Appearently, the > memory > is the problem. > > I also tried various memory re-allocation procedures, They didn't work. Can > anyboy help on this? Thanks in advance. > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Memory-Issue-tp2335860p2335860.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.