I have a more general problem for you. Given n items and 2 <=g <<n , how do you divide the n items into g groups that are as "equal as possible."
First, operationally define "as equal as possible." Second, define the algorithm to carry out the definition. Hint: Note that sum{m[i]} for i <=g must sum to n, where m[i] is the number of items in the ith group. Third, write R code for the algorithm. Exercise for the reader. I may be wrong, but I think numerical analysts might also have a little fun here. Randomization, of course, is trivial. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa <abouelmakarim1...@gmail.com> wrote: > > Dear Thomas: > > > Thank you very much for your input in this matter. > > > The core part of this R code(s) (please see below) was written by *Richard > O'Keefe*. I had three examples with different sample sizes. > > > > *First sample of size n1 = 204* divided randomly into three groups of sizes > 68. *No problems with this one*. > > > > *The second sample of size n2 = 112* divided randomly into three groups of > sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes > (37, 37, and 37). *How to fix the code to make sure that the output will be > three groups of sizes 37, 37, and 38*. > > > > *The third sample of size n3 = 284* divided randomly into three groups of > sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > output will be three groups of sizes 94, 95, and 95*. > > > With many thanks > > abou > > > ########### ------------------------ ############# > > > N1 <- 485 > population1.IDs <- seq(1, N1, by = 1) > #### population1.IDs > > n1<-204 ##### in this case the size > of each group of the three groups = 68 > sample1.IDs <- sample(population1.IDs,n1) > #### sample1.IDs > > #### n1 <- length(sample1.IDs) > > m1 <- n1 %/% 3 > s1 <- sample(1:n1, n1) > group1.IDs <- sample1.IDs[s1[1:m1]] > group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] > group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > > ####### -------------------------- > > > N2 <- 266 > population2.IDs <- seq(1, N2, by = 1) > #### population2.IDs > > n2<-112 ##### in this case the sizes of the three > groups are(37, 37, and 38) > ##### BUT this codes generate > three groups of equal sizes (37, 37, and 37) > sample2.IDs <- sample(population2.IDs,n2) > #### sample2.IDs > > #### n2 <- length(sample2.IDs) > > m2 <- n2 %/% 3 > s2 <- sample(1:n2, n2) > group1.IDs <- sample2.IDs[s2[1:m2]] > group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] > group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > > ####### -------------------------- > > > > N3 <- 674 > population3.IDs <- seq(1, N3, by = 1) > #### population3.IDs > > n3<-284 ##### in this case the sizes of the three > groups are(94, 95, and 95) > ##### BUT this codes generate > three groups of equal sizes (94, 94, and 94) > sample2.IDs <- sample(population2.IDs,n2) > sample3.IDs <- sample(population3.IDs,n3) > #### sample3.IDs > > #### n3 <- length(sample2.IDs) > > m3 <- n3 %/% 3 > s3 <- sample(1:n3, n3) > group1.IDs <- sample3.IDs[s3[1:m3]] > group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] > group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > groups.IDs > > ______________________ > > > *AbouEl-Makarim Aboueissa, PhD* > > *Professor, Statistics and Data Science* > *Graduate Coordinator* > > *Department of Mathematics and Statistics* > *University of Southern Maine* > > > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs...@yahoo.com> wrote: > > > Abou, > > > > > > > > I’ve been following your question on how to split a data column randomly > > into 3 groups using R. > > > > > > > > My method may not be amenable for a large set of data but it surely worth > > considering since it makes sense intuitively. > > > > > > > > mydata <- LETTERS[1:11] > > > > > mydata > > > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" > > > > > > > > # Let’s choose a random sample of size 4 from mydata > > > > > random_grp1 > > > > [1] "J" "H" "D" "A" > > > > > > > > Now my next random selection of data is defined by > > > > data_wo_random <- setdiff(mydata,random_grp1) > > > > # this makes sense because I need to choose random data from a set which > > is defined by the difference of the sets mydata and random_grp1 > > > > > > > > > data_wo_random > > > > [1] "B" "C" "E" "F" "G" "I" "K" > > > > > > > > This is great! So now I can randomly select data of any size from this set. > > > > Repeating this process can easily generate subgroups of your original > > dataset of any size you want. > > > > > > > > Surely this method could be improved so that this could be done > > automatically. > > > > Nevertheless, this is an intuitive method which I believe is easier to > > understand than some of the other methods posted. > > > > > > > > Hope this helps! > > > > > > > > Thomas Subia > > > > Statistician > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.