Re: [R] split dataset randomly in prediction and validation set

Dieter Menne Thu, 05 Feb 2009 08:12:47 -0800

Els Verfaillie <els.verfaillie <at> ugent.be> writes:

> For a geostatistical analysis, I would like to split my dataset randomly
> into 2 parts: a prediction set (with 2/3 of my data) and a validation set
> (with 1/3 of my data). Both datasets will thus contain different data.  Any
> suggestions?


Normally, you will not do this once, but round-robin. There are a few
packages around that help you in doing this (check for cross-validation),
but in most cases doing it by hand can be easier to understand 4 years 
later.

Dieter

# randomize your data; may not be required
set.seed(4711)
df = data.frame(x=rnorm(100),y=rnorm(100))[sample(1:nrow(df)),]
ncrossval = 3
# Fiddling required when length of data is not evenly divisble by ncrossval
df$group = rep(1:ncrossval,nrow(df)/+1)[1:nrow(df)]
for (group in 1:ncrossval)
{
  small = df[df$group==group,]
  big = df[df$group!=group,]
  # do your work with small and big
  str(small)
  str(big)
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split dataset randomly in prediction and validation set

Reply via email to