I wonder - isn't this issue one of the reasons to use RandomForests rather than CART?
On Wed, May 13, 2009 at 8:03 AM, Liaw, Andy <andy_l...@merck.com> wrote: > From: Uwe Ligges >> >> Yuanyuan wrote: >> > Greetings, >> > >> > I am using rpart for classification with "class" method. >> The test data is >> > the Indian diabetes data from package mlbench. >> > >> > I fitted a classification tree firstly using the original >> data, and then >> > exchanged the order of Body mass and Plasma glucose which are the >> > strongest/important variables in the growing phase. The >> second tree is a >> > little different from the first one. The misclassification >> tables are >> > different too. I did not change the data, but why the results are so >> > different? >> >> Well, at some splits the variable that comes first and yields in the >> same reduction of the entropy criterion as another one might be used, >> hence another result. >> >> Uwe Ligges > > I recently tried writing adaboost.m1 using rpart, and was surprised that > with very small training set (say n=10 or 20), I get a large improvement > in test set accuracy if I randomly shuffle the columns in the data at > every adaboost iteration. (With twonorm data, we're talking about 25% > error vs. 19%, using n=2000 test set.) It turned out to be the way > rpart deals with ties--- first come, first win. Without shuffling the > columns, rpart almost never pick any variable beyond the 10th. (In > twonorm, all variables are equally important, so one would expect > roughly equal selection frequency.) > > I've gotten some pointers from Terry Therneau about where in the code to > check. I may try to implement breaking ties at random (as I've done in > randomForest). No promises, though... > > Andy > >> >> >> >> > >> > Does anyone know how rpart deal with ties? >> > >> > Here is the codes for running the two trees. >> > >> > >> > library(mlbench) >> > data(PimaIndiansDiabetes2) >> > mydata<-PimaIndiansDiabetes2 >> > library(rpart) >> > fit2<-rpart(diabetes~., data=mydata,method="class") >> > plot(fit2,uniform=T,main="CART for original data") >> > text(fit2,use.n=T,cex=0.6) >> > printcp(fit2) >> > table(predict(fit2,type="class"),mydata$diabetes) >> > ## misclassifcation table: rows are fitted class >> > neg pos >> > neg 437 68 >> > pos 63 200 >> > #Klimt(fit2,mydata) >> > >> > pmydata<-data.frame(mydata[,c(1,6,3,4,5,2,7,8,9)]) >> > fit3<-rpart(diabetes~., data=pmydata,method="class") >> > plot(fit3,uniform=T,main="CART after exchaging mass & glucose") >> > text(fit3,use.n=T,cex=0.6) >> > printcp(fit3) >> > table(predict(fit3,type="class"),pmydata$diabetes) >> > ##after exchage the order of BODY mass and PLASMA glucose >> > neg pos >> > neg 436 64 >> > pos 64 204 >> > #Klimt(fit3,pmydata) >> > >> > >> > Thanks, >> > >> > >> > >> -------------------------------------------------------------- >> ------------------------ >> > Yuanyuan Huang >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > Notice: This e-mail message, together with any attachme...{{dropped:12}} > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dimitri Liakhovitski MarketTools, Inc. dimitri.liakhovit...@markettools.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.