[R] Random selection from a subsample

Tom Wilding Sun, 19 Dec 2010 02:34:43 -0800

Dear Mailing List

I have a data set (data4) consisting of a number of factors and a response 
variable.  I wish to randomly sample from a combination of two of those factors 
(GIS_station and Distance_code2) and return a new dataframe containing the 
original data structure (i.e. all the columns) but only containing the randomly 
selected rows.  The number of rows in each combination of GIS_station and 
Distance_code2 vary (widely) and some combinations are absent.


This is getting there:: 
with (data4,{
sub_sample10=by(data4,list(GIS_station,Distance_code2), function(x) 
{sample(1:nrow(x),10,replace=T)})
})

....but just generates two random numbers from the range 1:nrow(x).  It doesn't 
return the selected rows, which is what I want.

I'm sure I could this could be done in an elegant manner, using a subscript e.g.
 
sub_sample10 = data4 [sample (1:nrow (data4), size=10), ] 

only somehow combining it with the 'by' statement (e.g. by (data4, list 
(GIS_station, Distance_code2).......)) but I cannot get this to work.  

Any guidance on this much appreciated.

Thankyou.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random selection from a subsample

Reply via email to