Re: [R] Question about sampling

2012-06-14 Thread Guido Leoni
Just for make the archives more complete and simplifing the life of the following readers. I think to have solved my problem using the caret packages. In this package there is a function named createData Partition that after defining a column of interest in a data.frame allows to split a dataset in

Re: [R] Question about sampling

2012-06-14 Thread R. Michael Weylandt
I think you're right -- prob probably isn't quite what you need (at least, directly): constrained sampling like this is a little trickier -- I'll leave this to someone who knows more than me. Michael On Thu, Jun 14, 2012 at 9:07 AM, Guido Leoni wrote: > Sorry I'm not sure that prob is suitable f

Re: [R] Question about sampling

2012-06-14 Thread Guido Leoni
Sorry I'm not sure that prob is suitable for my purposes(but i'm quite newbie with R). If I correctly understand prob allows to set a weight for each row in the original dataset in order to include the rows on the basis of their weights). ... I'm not sure to correctly understanding ;-) In my case a

Re: [R] Question about sampling

2012-06-14 Thread R. Michael Weylandt
sample() takes a prob = argument which lets you supply weights, which need not sum to one so, if I understand you, you could just pass TRUEs and FALSEs for those rows you want. If I'm wrong about that last bit, I'm still pretty confident sample(prob = ) is the way to go. Best, Michael On Thu, Jun

[R] Question about sampling

2012-06-14 Thread Guido Leoni
Dear list I wish to extract from a population genotypized for 10 SNP a subsample of the same population of size n with similar allele frequencies. Essentially i have a matrix of 200 rows (df) like this Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1, sample01,Case,1,1,1,-1 sample02,Co