Tudor:

I successfully model-based partitioned several datasets through the use of mob from the party package (thanks Achim et al. once again !!!). At times, however, the partitioning leads to terminal nodes in which the parameter estimates of the model are not significant (although the split points and in general the proposed segmentation both seem reasonable).

There are two aspects to this:

(1) The algorithm just determines whether the coefficients between two child nodes are significantly different. It may or may not be the case that they are significantly different from zero within each node. As an example: You may have a tree with a single split and two child nodes. In the first child node, you have a highly significant parameter value, but in the second node, you have no significant value.

(2) Due to partitioning, it may be the case that not all parameters of the model are identified in all child nodes. Currently, within mob(), this is not systematically checked. In particular, you may have (quasi-)complete separation in binomial GLMs if a child node is particularly "pure". This seems to have happened in your example below. From a machine learning point of view, this is not a bad thing, you just need to interpret it correctly.

As I do not seem to be able to come up with an intuitive explanation/interpretation for this (other than that the partitioning model may be appropriate for parts of the dataset(s)), I wonder if any of you could share your thoughts on this topic with me. For your convenience I attached a relevant set of results below.

I guess that the variable "P" is binary and that when you cross-tabulate it with the response for Node 3, that there are zeros in the contingency table. I.e. you may have a perfect split in that one sub-sample.

hth,
Z


$`2`

Call:
NULL

Deviance Residuals:
                Min                   1Q               Median
3Q                  Max
-2.1613499829328759  -0.1182099512510448   0.0000000000000000
0.1199438072333263   1.7963628663418680

Coefficients:
                       Estimate          Std. Error  z value
Pr(>|z|)
(Intercept) 38.6736721222665096  5.1182299436934375  7.55606
0.000000000000041545 ***
P           -3.8195232976021787  0.5042297985419135 -7.57497
0.000000000000035922 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 407.0806101624161  on 293  degrees of freedom
Residual deviance: 132.0087256781199  on 292  degrees of freedom
AIC: 136.0087256781199

Number of Fisher Scoring iterations: 7


$`3`

Call:
NULL

Deviance Residuals:
                    Min                       1Q                   Median
3Q                      Max
-0.00009134433923085110   0.00000000000000000000   0.00000000000000000000
0.00000000000000000000   0.00009204763394325872

Coefficients:
                        Estimate           Std. Error  z value Pr(>|z|)
(Intercept)   1755.7555999083327 601505.6700290179579  0.00292  0.99767
P             -181.3394660743267  62127.5207770660636 -0.00292  0.99767

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 94.20918454290385568583588  on 67  degrees of freedom
Residual deviance:  0.00000001683616309495537  on 66  degrees of freedom
AIC: 4.000000016836163

Number of Fisher Scoring iterations: 25

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to