On Oct 8, 2011, at 16:04 , francy wrote: > Hi, > > I am having trouble understanding how to approach a simulation: > > I have a sample of n=250 from a population of N=2,000 individuals, and I > would like to use either permutation test or bootstrap to test whether this > particular sample is significantly different from the values of any other > random samples of the same population. I thought I needed to take random > samples (but I am not sure how many simulations I need to do) of n=250 from > the N=2,000 population and maybe do a one-sample t-test to compare the mean > score of all the simulated samples, + the one sample I am trying to prove > that is different from any others, to the mean value of the population. But > I don't know: > (1) whether this one-sample t-test would be the right way to do it, and how > to go about doing this in R > (2) whether a permutation test or bootstrap methods are more appropriate > > This is the data frame that I have, which is to be sampled: > df<- > i.e. > x y > 1 2 > 3 4 > 5 6 > 7 8 > . . > . . > . . > 2,000 > > I have this sample from df, and would like to test whether it is has extreme > values of y. > sample1<- > i.e. > x y > 3 4 > 7 8 > . . > . . > . . > 250 > > For now I only have this: > > R=999 #Number of simulations, but I don't know how many... > t.values =numeric(R) #creates a numeric vector with 999 elements, which > will hold the results of each simulation. > for (i in 1:R) { > sample1 <- df[sample(nrow(df), 250, replace=TRUE),] > > But I don't know how to continue the loop: do I calculate the mean for each > simulation and compare it to the population mean? > Any help you could give me would be very appreciated, > Thank you.
The straightforward way would be a permutation test, something like this msamp <- mean(sample1$y) mpop <- mean(df$y) msim <- replicate(10000, mean(sample(df$y, 250))) sum(abs(msim-mpop) >= abs(msamp-mpop))/10000 I don't really see a reason to do bootstrapping here. You say you want to test whether your sample could be a random sample from the population, so just simulate that sampling (which should be without replacement, like your sample is). Bootstrapping might come in if you want a confidence interval for the mean difference between your sample and the rest. Instead of sampling means, you could put a full-blown t-test inside the replicate expression, like: psim <- replicate(10000, {s<-sample(1:2000, 250); t.test(df$y[s], df$y[-s])$p.value}) and then check whether the p value for your sample is small compared to the distribution of values in psim. That'll take quite a bit longer, though; t.test() is a more complex beast than mean(). It is not obvious that it has any benefits either, unless you specifically wanted to investigate the behavior of the t test. (All code untested. Caveat emptor.) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.