Max, I installed C50. I have a question about the syntax. Per the C50 manual:
## Default S3 method: C5.0(x, y, trials = 1, rules= FALSE, weights = NULL, control = C5.0Control(), costs = NULL, ...) ## S3 method for class ’formula’ C5.0(formula, data, weights, subset, na.action = na.pass, ...) I believe I need the method for class 'formula'. But I don't yet see in the manual how to tell C50 that I want to use that method. If I run: respLevel = read.csv("Resp Level Data.csv") respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE + FREC + SPED, data = respLevel) ...I get an error message: Error in gsub(":", ".", x, fixed = TRUE) : input string 18 is invalid in this locale What is the correct way to use the C5.0 method for class 'formula'? -Vik On Sep 21, 2012, at 4:18 AM, mxkuhn wrote: > There is also C5.0 in the C50 package. It tends to have smaller trees that > C4.5 and much smaller trees than J48 when there are factor predictors. Also, > it has an optional feature selection ("winnow") step that can be used. > > Max > > On Sep 21, 2012, at 2:18 AM, Achim Zeileis <achim.zeil...@uibk.ac.at> wrote: > >> Hi, >> >> just to add a few points to the discussion: >> >> - rpart() is able to deal with responses with more than two classes. Setting >> method="class" explicitly is not necessary if the response is a factor (as >> in this case). >> >> - If your tree on this data is so huge that it can't even be plotted, I >> wouldn't be surprised if it overfitted the data set. You should check for >> this and possibly try to avoid unnecessary splits. >> >> - There are various ways to do so for J48 trees without variable reduction. >> One could require a larger minimal leaf size (default is 2) or one can use >> "reduced error pruning", see WOW("J48") for more options. They can be easily >> used as e.g. J48(..., control = Weka_control(R = TRUE, >> M = 10)) etc. >> >> - There are various other ways of fitting decision trees, see for example >> http://CRAN.R-project.org/view=MachineLearning for an overview. In >> particular, you might like the "partykit" package which additionally >> provides the ctree() method and has a unified plotting interface for ctree, >> rpart, and J48. >> >> hth, >> Z >> >> On Thu, 20 Sep 2012, Vik Rubenfeld wrote: >> >>> Bhupendrashinh, thanks very much! I ran J48 on a respondent-level data set >>> and got a 61.75% correct classification rate! >>> >>> Correctly Classified Instances 988 61.75 % >>> Incorrectly Classified Instances 612 38.25 % >>> Kappa statistic 0.5651 >>> Mean absolute error 0.0432 >>> Root mean squared error 0.1469 >>> Relative absolute error 52.7086 % >>> Root relative squared error 72.6299 % >>> Coverage of cases (0.95 level) 99.6875 % >>> Mean rel. region size (0.95 level) 15.4915 % >>> Total Number of Instances 1600 >>> >>> When I plot it I get an enormous chart. Running : >>> >>>> respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + >>>> MODE + SPED + REVW, data = respLevel) >>>> respLevelTree >>> >>> ...reports: >>> >>> J48 pruned tree >>> ------------------ >>> >>> Is there a way to further prune the tree so that I can present a chart that >>> would fit on a single page or two? >>> >>> Thanks very much in advance for any thoughts. >>> >>> >>> -Vik >>> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.