I recently learned about the bigmemory and foreach packages and am trying 
to use them to help me create a very large matrix.  Without those 
packages, I can create the type of matrix that I want with 10 columns and 
5e6 rows.  I would like to be able to scale up to 5e9 rows, or more, if 
possible.

I have created a simplified example of what I'm trying to do, below.  The 
first part of the code shows what I'm trying to do without using the 
bigmemory or foreach packages.  I take information from a data frame and 
use that information to fill a matrix with simulated data.

The last part of the code is my ugly attempt to use the bigmemory and 
foreach packages in preparation for scaling up to a very large matrix.  It 
seems to be working ... at this small scale, anyway.  But, surely there is 
a better way to do it than what I present here.  I am particularly 
concerned about efficiency because when I did a little experimenting with 
foreach and rnorm using 5e4 records, things seemed to get slow in a hurry 
(ha!).  I would appreciate any suggestions you could offer.

I'm using R for Windows 2.13.0, and my memory.limit() in R is 2GB 
(32-bit).

Thanks!

Jean

=====

> system.time(look <- rnorm(5e4))
   user  system elapsed 
   0.02    0.00    0.01 
> system.time(look <- foreach(i=1:5e4, .combine=c) %do% rnorm(1))
   user  system elapsed 
  91.29    0.05   92.40 
> system.time(look <- foreach(i=1:5e4, .combine=c) %dopar% rnorm(1))
   user  system elapsed 
  90.06    0.03   91.20 

=====

library(bigmemory)
library(foreach)

# small data frame that instructs how to fill matrix
info <- data.frame(p=c(0.3, 0.5, 0.2), a1=c(100, 200, 80), a2=c(120, 300, 
150))
nrowz <- dim(info)[1]

# example with small matrix
n <- 50
end.i <- cumsum(n*info$p)
start.i <- c(0, end.i[-nrowz]) + 1
m <- matrix(NA, nrow=n, ncol=2)
for(i in 1:nrowz) {
        m[start.i[i]:end.i[i], 1] <- runif(n*info$p[i], info$a1[i], 
info$a2[i])
        m[start.i[i]:end.i[i], 2] <- rnorm(n*info$p[i], info$a1[i], 
info$a2[i])
        }

# example getting ready to scale up to large matrix
n <- 50
end.i <- cumsum(n*info$p)
start.i <- c(0, end.i[-nrowz]) + 1
m <- filebacked.big.matrix(nrow=n, ncol=2, backingfile="test3.bin", 
descriptorfile="test3.desc")

m[start.i[1]:end.i[1], 1] <- foreach(i=start.i[1]:end.i[1], .combine=c) 
%do% runif(1, info$a1[1], info$a2[1])
m[start.i[2]:end.i[2], 1] <- foreach(i=start.i[2]:end.i[2], .combine=c) 
%do% runif(1, info$a1[2], info$a2[2])
m[start.i[3]:end.i[3], 1] <- foreach(i=start.i[3]:end.i[3], .combine=c) 
%do% runif(1, info$a1[3], info$a2[3])

m[start.i[1]:end.i[1], 2] <- foreach(i=start.i[1]:end.i[1], .combine=c) 
%do% rnorm(1, info$a1[1], info$a2[1])
m[start.i[2]:end.i[2], 2] <- foreach(i=start.i[2]:end.i[2], .combine=c) 
%do% rnorm(1, info$a1[2], info$a2[2])
m[start.i[3]:end.i[3], 2] <- foreach(i=start.i[3]:end.i[3], .combine=c) 
%do% rnorm(1, info$a1[3], info$a2[3])

head(m)



`·.,,  ><(((º>   `·.,,  ><(((º>   `·.,,  ><(((º>

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to