-- begin included message --- Hi, I am using rpart decision trees to analyze customer churn. I am finding that the decision trees created are not effective because they are not able to recognize factors that influence churn. I have created an example situation below. What do I need to do to for rpart to build a tree with the variable experience? My guess is that this would happen if rpart used the loss matrix while creating the tree.
> experience <- as.factor(c(rep("good",90), rep("bad",10))) > cancel <- as.factor(c(rep("no",85), rep("yes",5), rep("no",5), rep("yes",5))) > table(experience, cancel) cancel experience no yes bad 5 5 good 85 5 > rpart(cancel ~ experience) n= 100 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 100 10 no (0.9000000 0.1000000) * I tried the following commands with no success. rpart(cancel ~ experience, control=rpart.control(cp=.0001)) rpart(cancel ~ experience, parms=list(split='information')) rpart(cancel ~ experience, parms=list(split='information'), control=rpart.control(cp=.0001)) rpart(cancel ~ experience, parms=list(loss=matrix(c(0,1,10000,0), nrow=2, ncol=2))) --- end inclusion -------- The program works fine with rpart(as.numeric(cancel) ~ experience), which does a fit to try and predict the probability of cancellation rather than a YES/NO decision for each node. I usually find this more informative, particularly for early analysis. Brieman et al in the original CART book refer to this as odds regression. In this analysis, if a split leads to one child with 30% cancel and another with 5% cancellation the split is successful. When using a factor as the y variable, this split is scored as useless, since the parent and both children are scored as "NO". By adjusting the losses to be just right you can get your data to split. You need to make them such that 85/5 is predicted as 'no cancel' and 5/5 as 'yes cancel'; 1:2 losses would suffice. In the example where you set losses to 1:10000 both nodes are scored as a 'yes'. Terry Therneau ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.