Dear Amber,
your data contains missing values and you don't use surrogate splits to
deal with them. So, the observations are passed down the tree randomly
(there is no "majority" argument to "ctree_control"!) and thus it might
happen that too small terminal nodes are created.
Simply use surrogate split and the tree will be deterministic with
correct-sized terminal nodes (maxsurrogate = 3, for example).
Best,
Torsten
On Mon, 9 Jun 2014, Amber Dawn Nolder wrote:
I have attached the data set (cavl) and R code used when I got the results I
posted about. I included the code I used at the top of the document. Below
that is the version of R used and some of the results I obtained.
Many thanks!
Amber
On Wed, 4 Jun 2014 09:12:15 +0200 (CEST)
Torsten Hothorn <torsten.hoth...@uzh.ch> wrote:
On Tue, 3 Jun 2014, Amber Dawn Nolder wrote:
I apologize for my lack of knowledge with R. I usually load my data as a
csv file. May I send that to you? I was not sure if I could do so on the
list.
yes, and the R code you used. Thanks,
Torsten
Thank you?
On Fri, 30 May 2014 09:37:23 +0200 (CEST)
Torsten Hothorn <torsten.hoth...@uzh.ch> wrote:
Amber,
this looks like an error -- could you pls send me a reproducible example
so that I can track the problem down?
Best,
Torsten
________________________________________________________________
Prof. Dr. Torsten Hothorn =========
\\
Universitaet Zuerich \\
Institut fuer Epidemiologie, Biostatistik und \\
Praevention, Abteilung Biostatistik //
Hirschengraben 84 //
CH-8001 Zuerich //
Schweiz //
==========
Telephon: +41 44 634 48 17
Fax: +41 44 634 43 86
Web: http://tiny.uzh.ch/6p
________________________________________________________________
On Wed, 28 May 2014, Achim Zeileis wrote:
Falls Du es nicht eh gesehen hast...
lg,
Z
---------- Forwarded message ----------
Date: Wed, 28 May 2014 17:16:12 -0400
From: Amber Dawn Nolder <a.d.nol...@iup.edu>
To: r-help@r-project.org
Subject: [R] partykit ctree: minbucket and case weights
Hello,
I am an R novice, and I am using the "partykit" package to create
regression trees. I used the following to generate the trees:
ctree(y~x1+x2+x3+x4,data=my_data,control=ctree_control(testtype =
"Bonferroni", mincriterion = 0.90, minsplit = 12, minbucket = 4,
majority = TRUE)
I thought that "minbucket" set the minimum value for the sum of
weights
in each terminal node, and that each case weight is 1, unless
otherwise
specified. In which case, the sum of case weights in a node should
equal the
number of cases (n) in that node. However, I sometimes obtain a tree
with
a terminal node that contains fewer than 4 cases.
My data set has a total of 36 cases. The dependent and all
independent
variables are continuous data. Variables x1 and x2 contain missing
(NA)
values.
Could someone please explain why I am getting these results?
Am I mistaken about the value of case weights or about the use of
minbucket
to restrict the size of a terminal node?
This is an example of the output:
Model formula:
y ~ x1 + x2 + x3 + x4
Fitted party:
[1] root
| [2] x4 <= 30: 0.927 (n = 17, err = 1.1)
| [3] x4 > 30
| | [4] x2 <= 43: 0.472 (n = 8, err = 0.4)
| | [5] x2 > 43
| | | [6] x3 <= 0.4: 0.282 (n = 3, err = 0.0)
| | | [7] x3 > 0.4: 0.020 (n = 8, err = 0.0)
Number of inner nodes: 3
Number of terminal nodes: 4
Many thanks!
Amber Nolder
Graduate Student
Indiana University of Pennsylvania
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.