On Mar 29, 2012, at 1:41 AM, ilai wrote: > On Wed, Mar 28, 2012 at 3:53 PM, Benton, Paul > <hpaul.bento...@imperial.ac.uk> wrote: >> Hello all R-er, >> >> I'm trying to run a resampling method on some data. The current method I >> have takes 2+ days or a lot of memory . I was wondering if anyone has a >> better suggestion. >> >> Currently I take a matrix and get the correlation matrix from it. This will >> be called rho.A. Each element in this will be tested against the >> distribution from the resampled correlation B matrix. >> >> Some example code: >> >> A<-matrix(rnorm(100), ncol=10) >> B<-matrix(rnorm(100), ncol=10) >> >> rho.A<-cor(A) >> >> { >> idx<-sample(1:10, 10) >> idx >> # [1] 8 4 5 7 1 9 2 10 6 3 >> >> rho.B<-cor(B[,idx]) >> } ## repeat this x time (currently 500) >> >> ## in essence we then have the following : >> rho.arrayB<-array(runif((10*10)*500), dim=c(10,10,500)) > > Err... no we don't. sample(10,10) ; sample(10,10) ... only permutes > the columns, so the 500 cor(B) have exactly the same values in > different off diag positions. Using runif they are unique
My apologies for trying to make some example data. I should have done exactly what I was doing, which would have been harder to read. Since runif doesn't give exactly the same thing I'll give my functions that I'm using. ## note rperm was originally submitted to the stackexchange.com ## http://stats.stackexchange.com/questions/24300/how-to-resample-in-r-without-repeating-permutations rperm <- function(m, size=2, replaceTF=TRUE) { # Obtain m unique permutations of 1:size # Function to obtain a new permutation. newperm <- function() { count <- 0 # Protects against infinite loops repeat { # Generate a permutation and check against previous ones. p <- sample(1:size, replace=replaceTF) hash.p <- paste(p, collapse="") # make a name for the list if (is.null(cache[[hash.p]])) break ## stop if we have not already made this seq - repeat to make another new seq # Prepare to try again. count <- count+1 if (count > 1000) { # 1000 is arbitrary; adjust to taste p <- NA # NA indicates a new permutation wasn't found hash.p <- "" break } } cache[[hash.p]] <<- TRUE # Update the list of permutations found p # Return this (new) permutation } # Obtain m unique permutations. cache <- list() replicate(m, newperm()) } library(parallel) cl <- makeCluster(detectCores()) bootComb<-t(rperm(times, ncol(mat.B))) pValues.mat<-matrix(NA, nrow=nrow(mat.B), ncol=nrow(mat.B)) for(i in 1:nrow(mat.B)){ cat(round(i/nrow(mat.B),2)*100, "% \r") rho<-parApply(cl, bootComb, 1, function(idx, xmat, ymat){ rho<-apply(ymat[,idx], 1, function(ymat, xmat){ ## parallel here due to memory sizes rho<-cor(xmat, ymat) }, xmat[idx]) ##gives back a vector of rho for n vs m }, mat.B[i, ], mat.B) ## returns a matrix of rho's for each combination rho.all<- cbind(rho, rho.A[i,]) pValues.mat[i, ] <- parApply(cl, rho.all, 1, function(rhoVec){ tl <- length(rhoVec) ## test index bl <- tl-1 ## dist index wilcox.test(rhoVec[1:bl], rhoVec[tl])$p.value }) } stopCluster(cl) >> >> ## Then test if rho.A[1,1] come from the distribution of rho.B[1,1] >> pvalueMat[1,1]<-wilcox.test(rho.array[1,1,] , rho.A[1,1])$p.value >> > > From what I know cor(A)[ i , i ] = cor(B)[ j , j ] = 1 for any > choice of A,B,i and j No, cor(a)[i, j] != cor(b)[i , j] If your concern is because they are coming from the same distribution then again this is example data. Even then I would imagine that the correlation would be different for small n. Either way resampling the columns will give different correlation values. Please try the code yourself. > I don't think Wilcox intended his test to be used in this way…. Probably true …. care to suggest a different statistical test ? > > I would start with fixing these issues first so you don't wait 2 days > for a vector of NaN's Actually I didn't get a vector or NaN's I got a matrix of pvalues. Which has proved useful. Now I want to make the function faster and was looking for a bit of help. > > Cheers > > >> However, my array size would be 2300 x 2300 x 500 which R won't let me even >> make as an empty structure. Any suggestion are more than welcomed !! >> >> Cheers, >> >> Paul >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.