Thanks for your point of view Terry! It is always fascinating to follow the history of the field, especially as told by someone involved with it.
Jude Ryan -----Original Message----- From: Terry Therneau [mailto:thern...@mayo.edu] Sent: Tuesday, June 23, 2009 9:22 AM To: Ryan, Jude; c...@datanalytics.com Cc: r-help@r-project.org Subject: Re: [R] Recursive partitioning algorithms in R vs. alia A point of history: Both the commercial CART program and the rpart() function are based on the book Classification and Regression Trees (Breiman, Friedman, Olshen, Stone, 1984). As a reader/commentator on one of the early drafts I got to know the material well. CART started as a large Fortran program written by Jerry Friedman which was the testing ground for the ideas in the book. I had the code at one time and made some modifications to it, but found it too frustrating to go very far with. Fortran is just too clumsy for a recursive task, and Jerry's ability to hold upteen variables in his head at once greater than mine -- the Fortran was a large monlithic block. Salford Systems aquired rights to that code; I don't know whether any of the original lines remain in their product. I had lots of conversations with their main programmer (15-20 years ago now) about methods for speeding it up; mainly an interesting problem in optimal indexing. When rpart was first written it's output agreed with CART almost entirely. The only major difference was in surrogates: I pick the surrogate with the largest number of agreements, CART picked that with the greatest % agreement. This means that rpart favors variables with fewer missing values. Since that point in time both codes have evolved. I haven't had time to do important work on rpart in over a decade. It' not surprising that the graphics and display are behind the curve, what's more surprising is that it still endures. Rpart is called "rpart" because the authors copyrighted the term "CART" for their program. It was the best alternative name that I could come up with at the time. I find it amusing that one consequence of their copyright choice is that I now see "recursive partitioning" far more often than "CART" as the generic label for tree based methods. Terry T Please do not transmit orders or instructions regarding a UBS account electronically, including but not limited to e-mail, fax, text or instant messaging. The information provided in this e-mail or any attachments is not an official transaction confirmation or account statement. For your protection, do not include account numbers, Social Security numbers, credit card numbers, passwords or other non-public information in your e-mail. Because the information contained in this message may be privileged, confidential, proprietary or otherwise protected from disclosure, please notify us immediately by replying to this message and deleting it from your computer if you have received this communication in error. Thank you. UBS Financial Services Inc. UBS International Inc. UBS Financial Services Incorporated of Puerto Rico UBS AG\ \ \ UBS reserves the right to retain all messag...{{dropped:6}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.