On Thu, 28 Jul 2011, seanstcl...@verizon.net wrote:


  I am running the ctree function in R.



  My data has about 10 variables, many of which are categorical.  2 of the
  categorical variables have many levels (one has 900 levels, another has
  1,000 levels).  As an example, 1 of these variables is disease code and is
  structured as A, B, C, ...., AA, AB, AC....



  Each time i've tried to run the ctree function, including these 2 variables
  in  the data, the function never stops running.  When i remove these 2
  variables from the data and run without them, the function returns in about
  3 seconds.



  Q:  Is there a limit to the amount of levels that a categorical variable can
  contain?  Is there something else that i may be overlooking?

ctree() tries to split such a variable into two groups: left and right daughter node. And there are 2^(k-1) - 1 possible groupings for a categorical variable with k levels. For k=1000 this is simply too large to be computed in finite time.

You can try to break it down to a coarser classification of levels that is still computable. Or, if the categorical variable were ordered, it needs to be declared and then only k-1 splits are possible which is small enough.

hth,
Z





  THanks.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to