Thanks Max for your answer. First, I do not understand your post. Why is it a problem if two of predictions match? From the formula for calculating R^2 I can see that there will be a DivByZero iff the total sum of squares is 0. This is only true if the predictions of all the predicted points from the test-set are equal to the mean of the test-set. Why should this happen?
Anyway, I wrote the following code to check what you tried to tell: -- library(caret) data(trees) formula=Volume~Girth+Height customSummary <- function (data, lev = NULL, model = NULL) { print(summary(data$pred)) return(defaultSummary(data, lev, model)) } tc=trainControl(method='cv', summaryFunction=customSummary) train(formula, data=trees, method='rpart', trControl=tc) -- This outputs: --- Min. 1st Qu. Median Mean 3rd Qu. Max. 18.45 18.45 18.45 30.12 35.95 53.44 Min. 1st Qu. Median Mean 3rd Qu. Max. 22.69 22.69 22.69 32.94 38.06 53.44 Min. 1st Qu. Median Mean 3rd Qu. Max. 30.37 30.37 30.37 30.37 30.37 30.37 [cut many values like this] Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. ----- As I didn't understand your post, I don't know if this confirms your assumption. Thanks anyway, Dominik On 16/05/12 17:30, Max Kuhn wrote: > More information is needed to be sure, but it is most likely that some > of the resampled rpart models produce the same prediction for the > hold-out samples (likely the result of no viable split being found). > > Almost every incarnation of R^2 requires the variance of the > prediction. This particular failure mode would result in a divide by > zero. > > Try using you own summary function (see ?trainControl) and put a > print(summary(data$pred)) in there to verify my claim. > > Max > > On Wed, May 16, 2012 at 11:30 AM, Max Kuhn <mxk...@gmail.com> wrote: >> More information is needed to be sure, but it is most likely that some >> of the resampled rpart models produce the same prediction for the >> hold-out samples (likely the result of no viable split being found). >> >> Almost every incarnation of R^2 requires the variance of the >> prediction. This particular failure mode would result in a divide by >> zero. >> >> Try using you own summary function (see ?trainControl) and put a >> print(summary(data$pred)) in there to verify my claim. >> >> Max >> >> On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn <domi...@dbruhn.de> wrote: >>> Hy, >>> I got the following problem when trying to build a rpart model and using >>> everything but LOOCV. Originally, I wanted to used k-fold partitioning, >>> but every partitioning except LOOCV throws the following warning: >>> >>> ---- >>> Warning message: In nominalTrainWorkflow(dat = trainData, info = >>> trainInfo, method = method, : There were missing values in resampled >>> performance measures. >>> ----- >>> >>> Below are some simplified testcases which repoduce the warning on my >>> system. >>> >>> Question: What does this error mean? How can I avoid it? >>> >>> System-Information: >>> ----- >>>> sessionInfo() >>> R version 2.15.0 (2012-03-30) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>> [7] LC_PAPER=C LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 >>> reshape_0.8.4 >>> [6] plyr_1.7.1 lattice_0.20-6 >>> >>> loaded via a namespace (and not attached): >>> [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 >>> [5] tools_2.15.0 >>> ------- >>> >>> >>> Simlified Testcase I: Throws warning >>> --- >>> library(caret) >>> data(trees) >>> formula=Volume~Girth+Height >>> train(formula, data=trees, method='rpart') >>> --- >>> >>> Simlified Testcase II: Every other CV-method also throws the warning, >>> for example using 'cv': >>> --- >>> library(caret) >>> data(trees) >>> formula=Volume~Girth+Height >>> tc=trainControl(method='cv') >>> train(formula, data=trees, method='rpart', trControl=tc) >>> --- >>> >>> Simlified Testcase III: The only CV-method which is working is 'LOOCV': >>> --- >>> library(caret) >>> data(trees) >>> formula=Volume~Girth+Height >>> tc=trainControl(method='LOOCV') >>> train(formula, data=trees, method='rpart', trControl=tc) >>> --- >>> >>> >>> Thanks! >>> -- >>> Dominik Bruhn >>> mailto: domi...@dbruhn.de >>> >>> >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> >> Max > > > -- Dominik Bruhn mailto: domi...@dbruhn.de
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.