I recently learned about the bigmemory and foreach packages and am trying to use them to help me create a very large matrix. Without those packages, I can create the type of matrix that I want with 10 columns and 5e6 rows. I would like to be able to scale up to 5e9 rows, or more, if possible.
I have created a simplified example of what I'm trying to do, below. The first part of the code shows what I'm trying to do without using the bigmemory or foreach packages. I take information from a data frame and use that information to fill a matrix with simulated data. The last part of the code is my ugly attempt to use the bigmemory and foreach packages in preparation for scaling up to a very large matrix. It seems to be working ... at this small scale, anyway. But, surely there is a better way to do it than what I present here. I am particularly concerned about efficiency because when I did a little experimenting with foreach and rnorm using 5e4 records, things seemed to get slow in a hurry (ha!). I would appreciate any suggestions you could offer. I'm using R for Windows 2.13.0, and my memory.limit() in R is 2GB (32-bit). Thanks! Jean ===== > system.time(look <- rnorm(5e4)) user system elapsed 0.02 0.00 0.01 > system.time(look <- foreach(i=1:5e4, .combine=c) %do% rnorm(1)) user system elapsed 91.29 0.05 92.40 > system.time(look <- foreach(i=1:5e4, .combine=c) %dopar% rnorm(1)) user system elapsed 90.06 0.03 91.20 ===== library(bigmemory) library(foreach) # small data frame that instructs how to fill matrix info <- data.frame(p=c(0.3, 0.5, 0.2), a1=c(100, 200, 80), a2=c(120, 300, 150)) nrowz <- dim(info)[1] # example with small matrix n <- 50 end.i <- cumsum(n*info$p) start.i <- c(0, end.i[-nrowz]) + 1 m <- matrix(NA, nrow=n, ncol=2) for(i in 1:nrowz) { m[start.i[i]:end.i[i], 1] <- runif(n*info$p[i], info$a1[i], info$a2[i]) m[start.i[i]:end.i[i], 2] <- rnorm(n*info$p[i], info$a1[i], info$a2[i]) } # example getting ready to scale up to large matrix n <- 50 end.i <- cumsum(n*info$p) start.i <- c(0, end.i[-nrowz]) + 1 m <- filebacked.big.matrix(nrow=n, ncol=2, backingfile="test3.bin", descriptorfile="test3.desc") m[start.i[1]:end.i[1], 1] <- foreach(i=start.i[1]:end.i[1], .combine=c) %do% runif(1, info$a1[1], info$a2[1]) m[start.i[2]:end.i[2], 1] <- foreach(i=start.i[2]:end.i[2], .combine=c) %do% runif(1, info$a1[2], info$a2[2]) m[start.i[3]:end.i[3], 1] <- foreach(i=start.i[3]:end.i[3], .combine=c) %do% runif(1, info$a1[3], info$a2[3]) m[start.i[1]:end.i[1], 2] <- foreach(i=start.i[1]:end.i[1], .combine=c) %do% rnorm(1, info$a1[1], info$a2[1]) m[start.i[2]:end.i[2], 2] <- foreach(i=start.i[2]:end.i[2], .combine=c) %do% rnorm(1, info$a1[2], info$a2[2]) m[start.i[3]:end.i[3], 2] <- foreach(i=start.i[3]:end.i[3], .combine=c) %do% rnorm(1, info$a1[3], info$a2[3]) head(m) `·.,, ><(((º> `·.,, ><(((º> `·.,, ><(((º> Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.