The obvious: Take a small sample, say 25-50. Get an estimate of your distribution from that. Then use this to determine how many more (if any) additional samples you need for desired precision. This latter can probably easily be done via simulation/bootstrap if you don't want to specify a parametric form.
My guess is that your distribution is right-skew but not Poisson -- probably more like a truncated Poisson. But of course I have no idea what sorts of documents you've got, so how would I know? Bert Gunter Genentech Nonclinical Biostatistics On Mon, Jul 26, 2010 at 1:28 PM, Majonu <mnu...@andrew.cmu.edu> wrote: > > Basically, we have a population of 4,392 documents and we want to find out > the number of patents per document. We don’t want to go through all 4,392 > documents, but want a reliable sample size from which to draw inferences. I > feel like this count data will not follow a normal distribution, but more > like a Poisson (skewed right.) The problem is we don’t have much similar > data to this data set, so mean and standard deviation are unknown. Is there > any way to derive a sample size based off the confidence interval, margin of > error, and population size for what I assume to be a non-normal population? > Any help would be greatly appreciated. > -- > View this message in context: > http://r.789695.n4.nabble.com/Sample-size-calculation-for-non-normal-population-with-unknown-mean-and-SD-tp2302833p2302833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.