> -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Kristi Glover > Sent: Friday, March 07, 2014 11:56 AM > To: R-help > Subject: [R] stratified sampling > > Hi R users, > I have been struggling to select the equal number of samples from each > strata. I have the data collected in different years in different regions > with different sample size. Basically, I have two two conditions (year and > region). I wanted to make smaple sample size for both year and region. > I found that "strata.sampling' package can use if I had one condition, but > I have two conditions . Is there any package from which I can put two > conditions and select the rows randomly 999 times and put the mean value? > > Your help would be really appreciated. I am spending so much time... > > Here What I did for the example data > raw=structure(list(watershed = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), > year = c(2001, 2001, 2002, 2002, 2002, 2002, 2002, 2001, > 2001, 2001, 2002, 2002, 2002), sp1 = c(18.38, 29.1, 90.72, > 16.12, 49.12, 20.81, 65.1, 1.87, 72.99, 93.45, 38.44, 67.13, > 45.71), sp2 = c(46.46, 94, 86.87, 46.91, 21.41, 92.82, 87.75, > 16.18, 18.16, 18.76, 19.26, 52.73, 49.09), sp3 = c(86.9, > 62.82, 74.32, 75.49, 20.17, 58.84, 16.51, 44.14, 44.39, 32.36, > 53.28, 67.42, 33.37)), .Names = c("watershed", "year", "sp1", > "sp2", "sp3"), class = "data.frame", row.names = c(NA, -13L)) > > require(sampling) > if (is.null(method)) method <- "srswor" > if (!method %in% c("srswor", "srswr")) > stop('method must be "srswor" or "srswr"') > temp <- data[order(data[[group]]), ] > ifelse(length(size) > 1, > size <- size, > ifelse(size < 1, > size <- round(table(temp[group]) * size), > size <- rep(size, times=length(table(temp[group]))))) > strat = strata(temp, stratanames = names(temp[group]), > size = size, method = method) > getdata(temp, strat) > } > > test1<-strata.sampling(raw, ("watershed"), 2)# select 2 rows by watershed > > BUT, I wanted to use "year" too. ("watershed", "year"). When I added the > "year", it did not work > test1<-strata.sampling(raw, ("watershed", "year"), 2)# select 2 rows by > watershed and year > > test1<-strata.sampling(raw, ("watershed", "year"), 2) > Error: unexpected ',' in "test1<-strata.sampling(raw, ("watershed"," > > Here I want to select rows using tow conditions ("watershed", "year") with > 999 times and put mean value of sp1,sp2,sp3, using random sampling 999. > here is the output I wanted > output<-structure(list(watershed = structure(c(1L, 1L, 2L, 2L), .Label = > c("A", > "B"), class = "factor"), year = c(2001L, 2002L, 2001L, 2002L), > sp1 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp2 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = "factor"), > sp3 = structure(c(1L, 1L, 1L, 1L), .Label = "mean", class = > "factor")), .Names = c("watershed", > "year", "sp1", "sp2", "sp3"), class = "data.frame", row.names = c(NA, > -4L)) > > Any suggestions? > Thanks for your help. > KG > > > > > >
There seems to be something missing from your post (your code doesn't run as is even for a single stratum variable. But I might hazard a guess that when you want to pass multiple strata variables you need to pass them as a vector. c('watershed','year') and if you are passing multiple statum variables, you also need to pass a vector of desired sample sizes in the order that the strata appear in you data. In your case that would be size = c(2,2,2,2) If this doesn't solve the problem, then write back to the list with an example that works with a single variable with your data. Dan Daniel Nordlund Bothell, WA USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.