On Wed, Jan 25, 2012 at 04:00:27AM -0800, Eliano wrote: > Hi People, > > Does anyone have a good solution for this problem: > > a database called DB. > > > index <- sample(1:nrow(DB), size=0.2*nrow(BD)) > test <- DB[index,] > train <- DB[-index,] > > One of the variables in this database contais a target variable with two > values 0 and 1. > > Imagine now that i want to constraint the test data frame so the 20% of the > size of "test" has 50% of DB$target. > > Imagine: n=100 > DB$target = { 0=80 > 1=20} > > test=20 and contain 10 random values of DB$target=1 and 10 random values of > DB$target=0.
Hi. One way is as follows. t0 <- which(DB$target==0) t1 <- which(DB$target==1) m <- round(0.1*nrow(DB)) stopifnot(length(t0) >= m & length(t1) >= m) index <- c(sample(t0, size=m), sample(t1, size=m)) Hope this helps. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.