> I want to generate different samples using the >followindg code: > >g<-sample(LETTERS[1:2], 24, replace=T) > > How can I specify that I need 12 "A"s and 12 "B"s?
I introduced the concept of "sampling with minimal replacement" into the S-PLUS version of sample to handle things like this: sample(LETTERS[1:2], 24, minimal = T) This is very useful in variance reduction applications, to approximately stratify but with introducing bias. I'd like to see this in R. I'll raise a related issue - sampling with unequal probabilities, without replacement. R does the wrong thing, in my opinion: > values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, > .25))) > table(values) values 1 2 3 834 574 592 The selection probabilities are not proportional to the specified probabilities. In contrast, in S-PLUS: > values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, > .25))) > table(values) 1 2 3 1000 501 499 You can specify minimal = FALSE to get the same behavior as R: > values <- sapply(1:1000, function(i) sample(1:3, size=2, prob = c(.5, .25, > .25), minimal = F)) > table(values) 1 2 3 844 592 564 There is a reason this is associated with the concept of sampling with minimal replacement. Consider for example: sample(1:4, size = 3, prob = 1:4/10) The expected frequencies of (1,2,3,4) should be proportional to size*prob = c(.3,.6,.9,1.2). That isn't possible when sampling without replacement. Sampling with minimal replacement allows this; observation 4 is included in every sample, and is included twice in 20% of the samples. Tim Hesterberg Disclaimer - these are my opinions, not those of my employer. ======================================================== | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp. | | (206)802-2319 1700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | ======================================================== I'll teach short courses: Advanced Programming in S-PLUS: San Antonio TX, March 26-27, 2008. Bootstrap Methods and Permutation Tests: San Antonio, March 28, 2008. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.