Re: [R] cross-validation in rpart

Paolo Radaelli Tue, 26 May 2009 01:36:38 -0700

Dear R users,

I know cross-validation does not work in rpart with user defined splitfunctions. As Terry Therneau suggested, one can use the xpred.rpart functionand then summarize the matrix of the predicted values into a single"goodness" value.I need only a confirmation: set for example xval=10, if I correctlyunderstood a single column of the matrix obatined by xpred.rpart gives (fora cp level), for each of the 10 groups of obs, the value predicted by thetree obtained with the other 9 groups. Am I right ?One more question: I want to compare the results obtained with a tree, sayA, obtained with "class" method with the one, say B, I get with my customfunctions (init, split and eval). I should compare the cp tables for the twofitted rpart object. For tree B I only have the "rel error" column and Ineed to obtain the xerror and the xstd columns as for tree A. To this aim Ishould know how this values are computed. I guess they depend on the xvalvalue (in rpart.control) which is set to 10 by default. Does this mean thatthe observations are divided into 10 groups and, as before, the xerror iscomputed by averaging the erorrs one gets in predicting the class of agroup of obs by the tree obtained with the others 9 ? xstd is the standarddeviation of this errors ?


Thank you for your help
Paolo Radaelli

Paolo Radaelli
Dipartimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali
Facoltà di Economia
Università degli Studi di Milano-Bicocca
Via Bicocca degli Arcimboldi, 8
20126 Milano
Italy
e-mail paolo.radae...@unimib.it
Tel +39 02 6448 3163
Fax +39 02 6448 3105


begin included message

a.. begin included message I'm having a problem with custom functions inrpart, and before I tear my hair out trying to fix it, I want to make sureit's actually a problem. It seems that, when you write custom functions forrpart (init, split and eval) then rpart no longer cross-validates theresulting tree to return errors. A simple test is to use the usersplits.Rfunction to get a simple, custom rpart function, and then change fit1 andfit2 so that the both have xvals of 10. The problem occurs in that thecptable for fit1 doesn't have xerror or xstd, despite the fact that thecross-validation is set to 10-fold.I guess I just need conformation that cross-validation doesn't work withcustom functions, and if someone could explain to me why that is the case itwould be greatly appreciated.


Thanks,
Sam Stewart

 a.. end inclusion

You are right, cross-validation does not happen automatically withuser-written split functions. We can think of cross-validation as having twosteps:

1.. Get the predicted values for each observation, when that obs (or agroup) is left out of the data set. There is actually a vector of predictedvalues, one for each level of model complexity. This step can be done usingxpred.rpart, which does work for user-defined splits. It returns a matrixwith n rows (one per obs) and one column for each of the target cp values.Call this matrix "yhat".2.. Summarize each column of the above matrix yhat into a single"goodness" value. For anova fitting, for instance, this is justcolMeans((y-yhat)^2). For classification models it is a bit more complex, wehave to add up the expected loss L(y, hat) for each column using the lossmatrix and the priors. The reason that rpart does not do this step for auser-written function is that rpart does not know what summary isappropriate. For some splitting rules, e.g. survival data split using alog-rank test, I'm not sure that \italics{I} know what summation isappropriate.

  Terry Therneau

end included message

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cross-validation in rpart

Reply via email to