Hi there, It seems you got no answer. Maybe providing a reproducible example would help, as well as expressing your problem in more general terms. I am not an expert in sampling, but I would suggest (as does the help for sample) that you take a look at the sampling package, available on CRAN, and the strata function in this package that allows for stratified sampling.
HTH, Jean-Christophe 2011/9/8 Rebecca Ross <rebecca.r...@plymouth.ac.uk>: > Hi, > I wonder if someone can help me. I have built a gam model to predict the > presence of cold water corals and am now trying to evaluate my model by > splitting my dataset into training/test datasets. > > In an ideal world I would use the sample() function to randomly select rows > of data for me so for example with 936 rows of data in my HH dataset I might > say > > ss <- sample(nrow(HH), size = nrow(HH)-312, replace = FALSE) > training<-HH[ss,] > test<-HH[-ss,] > > in order to create a random training sub-sample of roughly 65% of my data > and test of 35%. (I would use a for() loop to automate the process of > building the datasets and running the prediction e.g.1000times) > > The problem is that I do have 2 caveats for the subsampling: > > > a) I need to have control over the prevalence (proportion of observed > presences within the dataset) in my build and test datasets > I realise I could do this by sorting my column of presences and absences and > then taking a subsample of the required size from the rows containing > presences then the rows containing absences and combining them. > > e.g. presence_records<-sample(1:117,size=75,replace=FALSE) > > absence_records<-sample(118:936,size=549,replace=FALSE) > > ss<-c(presence_records,absence_records) > but... > > b) My samples are within video transects and due to the risk of > autocorrelation within each transect, ideally it is by transect cluster that > they will be randomly selected. (a point within a transect cannot be > allocated to the training dataset when another point from that same transect > is already allocated to the test dataset) > > Is there a way I can fulfil both of these caveats and come out with my > (slightly less)random subsamples? > > Many thanks for your time! > All the best, > Bex > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.