On Oct 8, 2011, at 16:04 , francy wrote:

> Hi, 
> 
> I am having trouble understanding how to approach a simulation:
> 
> I have a sample of n=250 from a population of N=2,000 individuals, and I
> would like to use either permutation test or bootstrap to test whether this
> particular sample is significantly different from the values of any other
> random samples of the same population. I thought I needed to take random
> samples (but I am not sure how many simulations I need to do) of n=250 from
> the N=2,000 population and maybe do a one-sample t-test to compare the mean
> score of all the simulated samples, + the one sample I am trying to prove
> that is different from any others, to the mean value of the population. But
> I don't know:
> (1) whether this one-sample t-test would be the right way to do it, and how
> to go about doing this in R
> (2) whether a permutation test or bootstrap methods are more appropriate
> 
> This is the data frame that I have, which is to be sampled:
> df<-
> i.e.
> x y
> 1 2
> 3 4
> 5 6
> 7 8
> . .
> . .
> . .
> 2,000
> 
> I have this sample from df, and would like to test whether it is has extreme
> values of y. 
> sample1<-
> i.e.
> x y
> 3 4
> 7 8
> . .
> . .
> . .
> 250
> 
> For now I only have this: 
> 
> R=999 #Number of simulations, but I don't know how many...
> t.values =numeric(R)   #creates a numeric vector with 999 elements, which
> will hold the results of each simulation. 
> for (i in 1:R) {
> sample1 <- df[sample(nrow(df), 250, replace=TRUE),] 
> 
> But I don't know how to continue the loop: do I calculate the mean for each
> simulation and compare it to the population mean? 
> Any help you could give me would be very appreciated,
> Thank you. 

The straightforward way would be a permutation test, something like this

msamp <- mean(sample1$y)
mpop <- mean(df$y)
msim <- replicate(10000, mean(sample(df$y, 250)))

sum(abs(msim-mpop) >= abs(msamp-mpop))/10000

I don't really see a reason to do bootstrapping here. You say you want to test 
whether your sample could be a random sample from the population, so just 
simulate that sampling (which should be without replacement, like your sample 
is). Bootstrapping might come in if you want a confidence interval for the mean 
difference between your sample and the rest.

Instead of sampling means, you could put a full-blown t-test inside the 
replicate expression, like:

psim <- replicate(10000, {s<-sample(1:2000, 250); t.test(df$y[s], 
df$y[-s])$p.value})

and then check whether the p value for your sample is small compared to the 
distribution of values in psim.

That'll take quite a bit longer, though; t.test() is a more complex beast than 
mean(). It is not obvious that it has any benefits either, unless you 
specifically wanted to investigate the behavior of the t test. 

(All code untested. Caveat emptor.)


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to