Hello I apologise for the length of this entry but please bear with me.
In short: I need a way of subsampling communities from all possible communities of n taxa taken 1:n at a time without having to calculate all possible combinations (because this gives me a memory error - using combn() or expand.grid() at least). Does anyone know of a function? Or can you help me edit the combn or expand.grid functions to generate subsamples? In long: I have been creating all possible communities of n taxa taken 1:n at a time to get a presence/absence matrix of species occurrence in communities as below... Rows are samples, columns are species: A B C D . . . . 1 0 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 1 1 1 0 0 0 0 ...but the number of possible communities increases exponentially with each added taxon. n<-11 #number of taxa sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that when I reach about 25 taken 1:25 the number of combos is 33554432 and I get a memory error. I have found that the number of combos of x taxa taken from a pool of n creates a very kurtotic unimodal distribution,... x<-vector("integer",20) for (i in 1:20) {x[i]<-choose(20,i)} plot(x) ...but have found that limiting the number of samples for any community size to 1000 is good enough for the further analyses I wish to do. My problem lies in sampling all possible combos without having to calculate all possible combos. I have tried two methods but both give memory errors at about 25 taxa. The expand.grid() method: n <- 11 toto <- vector("list",n) titi <- lapply(toto,function(x) c(0,1)) tutu <- expand.grid(titi) The combn() method (a slightly lengthlier function): samplecommunityD<- function(n,numsamples) { super<-mat.or.vec(,n) for (numspploop in 1:n) { minor<-t(combn(n,numspploop)) if (dim(minor)[1]<numsamples) { minot<-mat.or.vec(dim(minor)[1],n) for (loopi in 1:dim(minor)[1]) { for (loopbi in 1:dim(minor)[2]) { minot[loopi,minor[loopi,loopbi]] <- 1 } } super<-rbind(super,minot) rm(minot) } else { minot<-mat.or.vec(numsamples,n) for (loopii in 1:numsamples) { thousand<-sample(dim(minor)[1],numsamples) for (loopbii in 1:dim(minor)[2]) { minot[loopii,minor[thousand[loopii],loopbii]] <- 1 } } super<-rbind(super,minot) rm(minot) } } super<-super[!rowSums(super)>n-1&!rowSums(super)<2,] return(super) } samplecommunityD(11,1000) So unless anyone knows of another function I could try my next step would be to modify the combn or expand.grid functions to generate subsamples, but their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone identify where in the code I would need to introduce a sampling term or skipping sequence? Thanks for your time Jasper -- View this message in context: http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22911399.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.