I think you're right -- prob probably isn't quite what you need (at least, directly): constrained sampling like this is a little trickier -- I'll leave this to someone who knows more than me.
Michael On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni <guido.le...@gmail.com> wrote: > Sorry I'm not sure that prob is suitable for my purposes(but i'm quite > newbie with R). > If I correctly understand prob allows to set a weight for each row in the > original dataset in order to include the rows on the basis of their > weights). ... I'm not sure to correctly understanding ;-) > In my case all the rows are equally important. I need "simply " that my > subset has in each column the same frequency of 1 that in the original > dataset > Thank you again > Guido > > 2012/6/14 R. Michael Weylandt <michael.weyla...@gmail.com> >> >> sample() takes a prob = argument which lets you supply weights, which >> need not sum to one so, if I understand you, you could just pass TRUEs >> and FALSEs for those rows you want. If I'm wrong about that last bit, >> I'm still pretty confident sample(prob = ) is the way to go. >> >> Best, >> Michael >> >> On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.le...@gmail.com> >> wrote: >> > Dear list I wish to extract from a population genotypized for 10 SNP a >> > subsample of the same population of size n with similar allele >> > frequencies. >> > Essentially i have a matrix of 200 rows (df) like this >> > Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1, >> > sample01,Case,1,1,1,-1 >> > sample02,Control,1,1,1,1 >> > sample06,Control,1,-1,1,0 >> > sample10,Case,1,1,1,0 >> > sample11,Control,1,1,1,1 >> > sample24,Control,-1,-1,1,0 >> > sample29,Control,1,-1,1,0 >> > sample42,Case,-1,-1,1,0 >> > sample64,Case,-1,1,1,0 >> > .... >> > I'm interested to mantain in my subsample the same frequencies of those >> > observed for the 1 value in each column >> > I approached the problem with sample() function >> > >> > mysample<-df[sample(1:nrow(df),100,replace=F),] >> > Then I tested that the frequencies of each allele in mysample are not >> > statistically different respect to the initial dataset by mean of >> > prop.test >> > This seems to work but do you know if there is a package that can do the >> > same thing allowing for example a more strict control? >> > Thank you very much >> > Guido >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Guido Leoni > National Research Institute on Food and Nutrition > (I.N.R.A.N.) > via Ardeatina 546 > 00178 Rome > Italy > > tel + 39 06 51 49 41 (operator) > + 39 06 51 49 4498 (direct) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.