I have a binary classification problem where the fraction of positives is very low, e.g. 20 positives in 10,000 examples (0.2%)
What is an appropriate cross validation scheme for training a classifier with very few positives? I currently have the following setup: ======================================== library(caret) tmp <- createDataPartition(Y, p = 9/10, times = 3, list = TRUE) myCtrl <- trainControl(method = "boot", index = tmp, timingSamps = 2, classProbs = TRUE, summaryFunction = twoClassSummary) RFmodel <- train(X,Y,method='rf',trControl=myCtrl,tuneLength=1, metric="ROC") SVMmodel <- train(X,Y,method='svmRadial',trControl=myCtrl,tuneLength=3, metric="ROC") KNNmodel <- train(X,Y,method='knn',trControl=myCtrl,tuneLength=10, metric="ROC") NNmodel <- train(X,Y,method='nnet',trControl=myCtrl,tuneLength=3, trace = FALSE, metric="ROC") ======================================== but I am not getting good performance (my ROC values are < 0.7 for all the classifiers above). Any thoughts? Thanks, James [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.