Christopher,

thanks for you interest.

I'm currently exploring a dataset with the help of conditional inference trees (still very much a beginner with this technique & log. reg. methods as a whole t.b.h.), since they explained more variation in my dataset than a binary logistic regression with /glm/. I started out with the /party /package, but after I while I ran into the 'updated' /partykit /package and tried this out, too.

If you want to use individual trees (as opposed to forests), then the "partykit" package is recommended because it contains much improved re-implementations of ctree() and mob() as well as the mob() convenience interfaces lmtree() and glmtree(). For forests see below.

Now, the strange thing is that both trees look quite different - actually even the very first split is different.

This might be due to several partitioning variables being associated with tiny p-values in the root node. The re-implementation in partykit internally computes with log-p-values and hence should be numerically more stable. In the old implementation it could happen that from several highly significant variables, always the first is chosen because the p-values were essentially indistinguishable for the computer.

If you think that this is not the problem, then please contact the package maintainer with a reproducible example.

Except for bug fixes like the one above, the trees grown by partykit::ctree and party::ctree should be the same.

So I did some research and came across the 'forest' concept. However, it seems that the /varImp /function does not yet work in the /partykit /implementation,

Correct. While the ctree() implementation in partykit is better than that in party, the same is _not_ true for cforest(). The new partykit::cforest is currently still a basic implementation which doesn't offer as many features as the party::cforest implementation. More work is needed especially for variable importance measures and different kinds of predictions.

which raises the question for me how I should evaluate the /partykit /forest - how can I find out whether the variables are important in the forest as in my /partykit /tree? Is there some way to do this or some other solution for this problem? I'd prefer to continue the /partykit /implementation of ctree, since it allows more settings for the final plot, which I'd need to get the final (large) plot into a readable form.

Related to this project, I'd also like to give statistics for the overall
model, e.g. overall significance, Nagelkerke's R², a C-value. After a
'regular' binary log. reg., I would use the lrm function to get these
values, but I am unsure whether it would be correct to also apply this
method to my tree data.

Overall significance is difficult because you have done model selection when growing the tree. As for pseudo R-squared or information criteria etc., it is relatively easy to compute these "by hand" based on the observed and fitted responses. An example for this is provided at:
http://stackoverflow.com/questions/29524670/how-to-find-the-the-deviance-of-an-as-party-object-converted-from-rpart-tree-in/29693223#29693223

Any help would be greatly appreciated!
-- Christopher



--
View this message in context: 
http://r.789695.n4.nabble.com/Trees-and-Forests-with-packages-party-vs-partykit-Different-results-tp4712214.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to