I have a binary classification problem where the fraction of positives is
very low, e.g. 20 positives in 10,000 examples (0.2%)

What is an appropriate cross validation scheme for training a classifier
with very few positives?

I currently have the following setup:
========================================
    library(caret)
    tmp <- createDataPartition(Y, p = 9/10, times = 3, list = TRUE)
    myCtrl <- trainControl(method = "boot", index = tmp, timingSamps = 2,
classProbs = TRUE, summaryFunction = twoClassSummary)

    RFmodel <- train(X,Y,method='rf',trControl=myCtrl,tuneLength=1,
metric="ROC")
    SVMmodel <- train(X,Y,method='svmRadial',trControl=myCtrl,tuneLength=3,
metric="ROC")
    KNNmodel <- train(X,Y,method='knn',trControl=myCtrl,tuneLength=10,
metric="ROC")
    NNmodel <- train(X,Y,method='nnet',trControl=myCtrl,tuneLength=3, trace
= FALSE, metric="ROC")

========================================
but I am not getting good performance (my ROC values are < 0.7 for all the
classifiers above). Any thoughts?

Thanks,

James

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to