Hi R users, I have been struggling to select the equal number of samples from each strata. I have the data collected in different years in different regions with different sample size. Basically, I have two two conditions (year and region). I wanted to make smaple sample size for both year and region. I found that "strata.sampling' package can use if I had one condition, but I have two conditions . Is there any package from which I can put two conditions and select the rows randomly 999 times and put the mean value?
Your help would be really appreciated. I am spending so much time... Here What I did for the example data raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001, 2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72, 16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13, 45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75, 16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9, 62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36, 53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L)) require(sampling) if (is.null(method)) method <- "srswor" if (!method %in% c("srswor", "srswr")) stop('method must be "srswor" or "srswr"') temp <- data[order(data[[group]]), ] ifelse(length(size) > 1, size <- size, ifelse(size < 1, size <- round(table(temp[group]) * size), size <- rep(size, times=length(table(temp[group]))))) strat = strata(temp, stratanames = names(temp[group]), size = size, method = method) getdata(temp, strat) } test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed BUT, I wanted to use "year" too. ("watershed", "year"). When I added the "year", it did not work test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by watershed and year > test1<-strata.sampling(raw, ("watershed", "year"), 2) Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed"," Here I want to select rows using tow conditions ("watershed", "year") with 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. here is the output I wanted output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L), sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor")), .Names = c("watershed", "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, -4L)) Any suggestions? Thanks for your help. KG [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.