Els Verfaillie <els.verfaillie <at> ugent.be> writes: > For a geostatistical analysis, I would like to split my dataset randomly > into 2 parts: a prediction set (with 2/3 of my data) and a validation set > (with 1/3 of my data). Both datasets will thus contain different data. Any > suggestions?
Normally, you will not do this once, but round-robin. There are a few packages around that help you in doing this (check for cross-validation), but in most cases doing it by hand can be easier to understand 4 years later. Dieter # randomize your data; may not be required set.seed(4711) df = data.frame(x=rnorm(100),y=rnorm(100))[sample(1:nrow(df)),] ncrossval = 3 # Fiddling required when length of data is not evenly divisble by ncrossval df$group = rep(1:ncrossval,nrow(df)/+1)[1:nrow(df)] for (group in 1:ncrossval) { small = df[df$group==group,] big = df[df$group!=group,] # do your work with small and big str(small) str(big) } ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.