A point of history: Both the commercial CART program and the rpart() function are based on the book Classification and Regression Trees (Breiman, Friedman, Olshen, Stone, 1984). As a reader/commentator on one of the early drafts I got to know the material well. CART started as a large Fortran program written by Jerry Friedman which was the testing ground for the ideas in the book. I had the code at one time and made some modifications to it, but found it too frustrating to go very far with. Fortran is just too clumsy for a recursive task, and Jerry's ability to hold upteen variables in his head at once greater than mine -- the Fortran was a large monlithic block. Salford Systems aquired rights to that code; I don't know whether any of the original lines remain in their product. I had lots of conversations with their main programmer (15-20 years ago now) about methods for speeding it up; mainly an interesting problem in optimal indexing. When rpart was first written it's output agreed with CART almost entirely. The only major difference was in surrogates: I pick the surrogate with the largest number of agreements, CART picked that with the greatest % agreement. This means that rpart favors variables with fewer missing values. Since that point in time both codes have evolved. I haven't had time to do important work on rpart in over a decade. It' not surprising that the graphics and display are behind the curve, what's more surprising is that it still endures. Rpart is called "rpart" because the authors copyrighted the term "CART" for their program. It was the best alternative name that I could come up with at the time. I find it amusing that one consequence of their copyright choice is that I now see "recursive partitioning" far more often than "CART" as the generic label for tree based methods. Terry T
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.