Re: [R] Decision Tree: Am I Missing Anything?

Vik Rubenfeld Fri, 21 Sep 2012 09:45:10 -0700

Max, I installed C50. I have a question about the syntax. Per the C50 manual:


## Default S3 method:
C5.0(x, y, trials = 1, rules= FALSE,
weights = NULL,
control = C5.0Control(),
costs = NULL, ...)

## S3 method for class ’formula’
C5.0(formula, data, weights, subset,
na.action = na.pass, ...)

I believe I need the method for class 'formula'. But I don't yet see in the 
manual how to tell C50 that I want to use that method. If I run:

respLevel = read.csv("Resp Level Data.csv")
respLevelTree = C5.0(BRAND_NAME ~ PRI + PROM + REVW + MODE + FORM + FAMI + DRRE 
+ FREC + SPED, data = respLevel)

...I get an error message:

Error in gsub(":", ".", x, fixed = TRUE) : 
  input string 18 is invalid in this locale

What is the correct way to use the C5.0 method for class 'formula'?


-Vik

On Sep 21, 2012, at 4:18 AM, mxkuhn wrote:

> There is also C5.0 in the C50 package. It tends to have smaller trees that 
> C4.5 and much smaller trees than J48 when there are factor predictors. Also, 
> it has an optional feature selection ("winnow") step that can be used. 
> 
> Max
> 
> On Sep 21, 2012, at 2:18 AM, Achim Zeileis <achim.zeil...@uibk.ac.at> wrote:
> 
>> Hi,
>> 
>> just to add a few points to the discussion:
>> 
>> - rpart() is able to deal with responses with more than two classes. Setting 
>> method="class" explicitly is not necessary if the response is a factor (as 
>> in this case).
>> 
>> - If your tree on this data is so huge that it can't even be plotted, I 
>> wouldn't be surprised if it overfitted the data set. You should check for 
>> this and possibly try to avoid unnecessary splits.
>> 
>> - There are various ways to do so for J48 trees without variable reduction. 
>> One could require a larger minimal leaf size (default is 2) or one can use 
>> "reduced error pruning", see WOW("J48") for more options. They can be easily 
>> used as e.g. J48(..., control = Weka_control(R = TRUE,
>> M = 10)) etc.
>> 
>> - There are various other ways of fitting decision trees, see for example 
>> http://CRAN.R-project.org/view=MachineLearning for an overview. In 
>> particular, you might like the "partykit" package which additionally 
>> provides the ctree() method and has a unified plotting interface for ctree, 
>> rpart, and J48.
>> 
>> hth,
>> Z
>> 
>> On Thu, 20 Sep 2012, Vik Rubenfeld wrote:
>> 
>>> Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
>>> and got a 61.75% correct classification rate!
>>> 
>>> Correctly Classified Instances         988               61.75   %
>>> Incorrectly Classified Instances       612               38.25   %
>>> Kappa statistic                          0.5651
>>> Mean absolute error                      0.0432
>>> Root mean squared error                  0.1469
>>> Relative absolute error                 52.7086 %
>>> Root relative squared error             72.6299 %
>>> Coverage of cases (0.95 level)          99.6875 %
>>> Mean rel. region size (0.95 level)      15.4915 %
>>> Total Number of Instances             1600
>>> 
>>> When I plot it I get an enormous chart.  Running :
>>> 
>>>> respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>>>> MODE + SPED + REVW, data = respLevel)
>>>> respLevelTree
>>> 
>>> ...reports:
>>> 
>>> J48 pruned tree
>>> ------------------
>>> 
>>> Is there a way to further prune the tree so that I can present a chart that 
>>> would fit on a single page or two?
>>> 
>>> Thanks very much in advance for any thoughts.
>>> 
>>> 
>>> -Vik
>>> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

Reply via email to