Dear Amber,

your data contains missing values and you don't use surrogate splits to deal with them. So, the observations are passed down the tree randomly (there is no "majority" argument to "ctree_control"!) and thus it might happen that too small terminal nodes are created.

Simply use surrogate split and the tree will be deterministic with correct-sized terminal nodes (maxsurrogate = 3, for example).

Best,

Torsten

On Mon, 9 Jun 2014, Amber Dawn Nolder wrote:

I have attached the data set (cavl) and R code used when I got the results I posted about. I included the code I used at the top of the document. Below that is the version of R used and some of the results I obtained.
Many thanks!
Amber 
On Wed, 4 Jun 2014 09:12:15 +0200 (CEST)
Torsten Hothorn <torsten.hoth...@uzh.ch> wrote:

On Tue, 3 Jun 2014, Amber Dawn Nolder wrote:

I apologize for my lack of knowledge with R. I usually load my data as a csv file. May I send that to you? I was not sure if I could do so on the list.

yes, and the R code you used. Thanks,

Torsten

Thank you?
On Fri, 30 May 2014 09:37:23 +0200 (CEST)
Torsten Hothorn <torsten.hoth...@uzh.ch> wrote:

Amber,

this looks like an error -- could you pls send me a reproducible example so that I can track the problem down?

Best,

Torsten


________________________________________________________________

Prof. Dr. Torsten Hothorn                       =========
                                                 \\
Universitaet Zuerich                             \\
Institut fuer Epidemiologie, Biostatistik und     \\
Praevention, Abteilung Biostatistik               //
Hirschengraben 84                                //
CH-8001 Zuerich                                 //
Schweiz                                        //
                                                ==========
Telephon:  +41 44 634 48 17
Fax:       +41 44 634 43 86
Web:       http://tiny.uzh.ch/6p
________________________________________________________________

On Wed, 28 May 2014, Achim Zeileis wrote:

Falls Du es nicht eh gesehen hast...

lg,
Z

---------- Forwarded message ----------
Date: Wed, 28 May 2014 17:16:12 -0400
From: Amber Dawn Nolder <a.d.nol...@iup.edu>
To: r-help@r-project.org
Subject: [R] partykit ctree: minbucket and case weights


   Hello,
   I am an R novice, and I am using the "partykit" package to create
   regression trees. I used the following to generate the trees:
   ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype =
   "Bonferroni", mincriterion = 0.90, minsplit = 12, minbucket = 4,
   majority = TRUE)
   I thought that "minbucket" set the minimum value for the sum of weights    in each terminal node, and that each case weight is 1, unless otherwise    specified. In which case, the sum of case weights in a node should equal the    number of cases (n) in that node. However, I  sometimes obtain a tree with
   a terminal node that contains fewer than 4 cases.
   My data set has a total of 36 cases. The dependent and all independent    variables are continuous data. Variables x1 and x2 contain missing (NA)
   values.
   Could someone please explain why I am getting these results?
   Am I  mistaken about the value of case weights or about the use of minbucket
   to restrict the size of a terminal node?
   This is an example of the output:
   Model formula:
   y ~ x1 + x2 + x3 + x4
   Fitted party:
   [1] root
   |   [2] x4 <= 30: 0.927 (n = 17, err = 1.1)
   |   [3] x4 > 30
   |   |   [4] x2 <= 43: 0.472 (n = 8, err = 0.4)
   |   |   [5] x2 > 43
   |   |   |   [6] x3 <= 0.4: 0.282 (n = 3, err = 0.0)
   |   |   |   [7] x3 > 0.4: 0.020 (n = 8, err = 0.0)
   Number of inner nodes:    3
   Number of terminal nodes: 4
   Many thanks!
   Amber Nolder
   Graduate Student
   Indiana University of Pennsylvania
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to